Distributed Tracing in Spring Boot 3.x: Micrometer Tracing + Zipkin
Master distributed tracing in Spring Boot 3.
- Spring Boot 3.x uses Micrometer Tracing (not Spring Cloud Sleuth) with Zipkin via micrometer-tracing-bridge-brave or otel
- TraceId and spanId are automatically propagated across Feign, WebClient, RestTemplate, and Kafka without code changes
- Add %X{traceId} and %X{spanId} to your logback pattern to correlate logs with Zipkin traces
- Set management.tracing.sampling.probability=1.0 for full sampling in dev; use 0.1 (10%) or lower in high-throughput production
- Choose B3 propagation (default, Zipkin-native) or W3C TraceContext (OTEL standard) based on your ecosystem
Imagine a restaurant where a customer order travels through the host, waiter, kitchen, and bar before arriving at the table. Distributed tracing gives every person handling that order the same ticket number (traceId) and their own step number (spanId). Zipkin is the manager's dashboard that shows the full journey of every order — where it spent the most time, where it failed, and who handled it at each step — so you can find the bottleneck without interrogating every staff member individually.
A user reports that their checkout takes 8 seconds. Your payment microservice logs show no slow queries. Order service looks fine. Inventory service appears healthy. The request touches seven services — which one is actually slow? Without distributed tracing, answering this question means correlating timestamps across seven separate log files, praying that clocks are synchronized, and manually reconstructing the call graph. This is the problem that distributed tracing was built to solve.
Spring Cloud Sleuth was the go-to distributed tracing solution for Spring Boot 2.x. It automatically injected traceId and spanId into MDC, propagated trace context over HTTP headers, and exported spans to Zipkin with minimal configuration. But Sleuth is end-of-life as of Spring Boot 3.0 — it does not support the new Spring Framework 6 and Jakarta EE 10 baseline. The replacement is Micrometer Tracing, a vendor-neutral tracing facade that ships as part of the Micrometer observability stack.
Micrometer Tracing follows the same philosophy as Micrometer Metrics: one API, multiple backends. You write your code against the Micrometer Tracing API, and a bridge library translates to either OpenTelemetry (OTEL) or Brave (the Zipkin tracing library). This separation means you can switch from Zipkin to Jaeger, Grafana Tempo, or Honeycomb without changing application code — only the bridge dependency changes.
Auto-instrumentation in Spring Boot 3.x is comprehensive. Feign clients, WebClient, RestTemplate, Kafka producers/consumers, Spring MVC controllers, Spring WebFlux handlers, JDBC queries (via datasource-micrometer), and scheduled tasks all gain automatic span creation and context propagation with zero code changes. TraceId and spanId appear in MDC automatically, enabling log correlation without custom filters.
Sampling is the critical operational knob. Tracing every request at 1.0 probability generates enormous data volumes in production and adds measurable latency overhead from span serialization and network calls to the Zipkin collector. Configuring an appropriate sampling rate — and understanding tail-based vs head-based sampling — is essential knowledge for running tracing in production without impacting service performance or collector infrastructure costs.
This guide covers the complete Micrometer Tracing + Zipkin stack for Spring Boot 3.x: dependency configuration, MDC log integration, cross-service propagation mechanics, Feign and Kafka tracing, sampling strategies, Zipkin UI navigation, B3 vs W3C propagation, and the failure modes that trip up teams migrating from Sleuth.
Dependencies and Configuration: Migrating from Sleuth to Micrometer Tracing
The first step for any Spring Boot 3.x tracing setup is understanding the new dependency structure. Spring Cloud Sleuth is not compatible with Spring Boot 3.x — it depends on Spring Boot 2.x and Spring Framework 5. Any Sleuth dependency in a Spring Boot 3 project will cause classpath conflicts.
Micrometer Tracing is the direct replacement. The core tracing API is in micrometer-tracing. The backend bridge is either micrometer-tracing-bridge-brave (for Zipkin/Brave) or micrometer-tracing-bridge-otel (for OpenTelemetry). For Zipkin reporting, add zipkin-reporter-brave. Spring Boot 3.x provides auto-configuration for all of these through spring-boot-actuator — you just need the bridge and reporter on the classpath.
The spring-boot-starter-actuator must be present for the auto-configuration to activate. Spring Boot 3.x detects the tracing bridge on the classpath and automatically configures a Tracer, ObservationRegistry, and span export pipeline. The configuration properties moved from spring.sleuth. to management.tracing. and management.zipkin.tracing.*.
For projects using Spring Cloud's bill-of-materials, spring-cloud-starter-sleuth should be replaced with the Micrometer Tracing starters. The Spring Cloud 2022.x (Kilburn) and later releases do not include Sleuth — Micrometer Tracing is the standard tracing mechanism.
Micrometer Observations are the unified abstraction in Spring Boot 3.x. An Observation wraps a unit of work and automatically creates metrics (via Micrometer), traces (via Micrometer Tracing), and log correlation (via MDC). Spring's own framework instrumentation (MVC handlers, WebClient, Feign, Kafka) uses Observations internally, which is why auto-instrumentation works without code changes. Custom instrumentation should also use the Observation API rather than directly creating Spans for consistency.
MDC Log Correlation: Linking Logs to Traces
Distributed tracing only delivers its full value when traces are correlated with logs. A Zipkin trace tells you which service was slow; the correlated logs tell you exactly what was happening inside that service at that moment. Micrometer Tracing automatically populates MDC (Mapped Diagnostic Context) with traceId, spanId, parentId, and sampled so any SLF4J-compatible logger can include them in log output.
The MDC keys in Spring Boot 3.x with Micrometer Tracing are: traceId (full 128-bit or 64-bit hex trace ID), spanId (current span ID), parentId (parent span ID, absent for root spans), and sampled (whether this trace is being exported to Zipkin). These replace Sleuth's X-B3-TraceId, X-B3-SpanId, X-B3-ParentSpanId MDC keys.
Configuring the log pattern is straightforward in Spring Boot: set logging.pattern.level or use a logback-spring.xml with %X{traceId} in the pattern. The key insight is that the MDC is populated for the duration of a span — if your code runs outside a span context (e.g., in a background thread that was not explicitly propagated), MDC values will be empty.
Thread propagation is the main MDC pitfall. When using @Async methods, CompletableFuture, or custom ExecutorService, MDC is not automatically copied to child threads. For @Async, the ContextPropagatingTaskDecorator from Micrometer Tracing wraps the task to copy the tracing context (and therefore MDC values) to the async thread. Configure this on your async executor.
For reactive WebFlux applications, MDC does not work with the traditional thread-local model because reactive chains may switch threads multiple times. Micrometer Tracing with WebFlux uses Reactor's Context for propagation. Spring Boot 3.x includes a Reactor Context-based MDC adapter that populates MDC when a subscriber processes a signal, but custom operators or blocking calls that escape the reactive context will not have MDC populated. Log statements inside Mono/Flux operators will have correct traceId values; log statements in manually spawned threads will not.
CompletableFuture.runAsync(), or custom thread pools will have empty MDC (no traceId in logs) unless ContextPropagatingTaskDecorator is configured on the executor. This is one of the most common log correlation failures.Cross-Service Propagation: Feign, WebClient, and Kafka
Micrometer Tracing's value comes from seamless context propagation across service boundaries. The mechanism is different for each transport, and understanding how it works for each helps diagnose propagation failures.
For HTTP clients, Micrometer Tracing provides interceptors and filters that inject B3 or W3C headers into outgoing requests and extract them from incoming requests. For RestTemplate, configure a RestTemplateObservationInterceptor. Spring Boot 3.x auto-configures this when RestTemplate is created via RestTemplateBuilder. For WebClient, the auto-configured WebClientCustomizer adds a ClientRequestObservationConvention that instruments each request. For Feign clients, spring-cloud-starter-openfeign auto-configures tracing when Micrometer Tracing is on the classpath.
For Kafka, tracing requires spring-kafka 3.0+ with observation explicitly enabled. The KafkaTemplate is instrumented to add trace headers (b3 or traceparent) to ProducerRecords. On the consumer side, the KafkaListenerContainerFactory must have observation enabled so it extracts trace headers from ConsumerRecords and creates child spans. This creates a trace graph that spans across Kafka topics, showing the producer send as the parent of the consumer process.
For RabbitMQ (Spring AMQP), the RabbitTemplate and SimpleMessageListenerContainer are similarly instrumented via Micrometer Observations in Spring AMQP 3.0+. The same enablement pattern applies.
A common scenario is a gateway or API proxy that strips custom headers. If your Nginx, Envoy, or AWS ALB strips X-B3-* headers, downstream services receive requests without trace context and create new root traces. Configure your proxies to allow-list B3 or W3C propagation headers. For Envoy-based service meshes (Istio), Envoy propagates B3 headers but only to the next hop — your application code must still propagate them from incoming request to outgoing calls. Micrometer Tracing handles this automatically for instrumented clients.
RestTemplate() bypasses Spring Boot's auto-configuration and will not have the Micrometer Tracing interceptor. Always inject RestTemplateBuilder and call builder.build() to get a properly instrumented instance.RestTemplate() or un-configured Kafka factories silently breaks propagation.Sampling Strategies: Balancing Observability and Overhead
Sampling determines what fraction of requests generate traces. At management.tracing.sampling.probability=1.0, every request is traced. At 0.1, one in ten requests is traced. At 0.01, one in one hundred. The probability must balance observability needs against three costs: CPU overhead of span creation, network overhead of exporting spans to Zipkin, and storage costs in the Zipkin backend.
Head-based sampling (what Micrometer Tracing implements by default) makes the sampling decision at the trace root — typically the first service to receive a request. This decision is propagated downstream via the X-B3-Sampled header (B3) or the sampled flag in the traceparent header (W3C). All downstream services honor the sampling decision from the root, ensuring complete traces are either fully sampled or fully dropped — never partial.
The right sampling rate depends on request volume. For a service handling 1,000 RPS, a 10% sampling rate gives 100 traces per second — plenty for debugging. At 10,000 RPS, even 1% (100 traces/sec) may stress your Zipkin collector. At 100 RPS, consider 100% sampling for complete visibility. Rule of thumb: target 10-100 traces per second entering your Zipkin collector from the fleet as a whole.
Custom sampler beans give finer control. You can implement a Sampler that samples 100% of requests containing an X-Debug-Trace header (for on-demand tracing of specific users), 100% of requests that result in errors, and 5% of normal requests. This tail-informed head-based hybrid is a common production pattern: guaranteed error trace capture with reduced load from healthy request traffic.
For latency-sensitive services, measure tracing overhead empirically. Span creation is typically sub-microsecond. The cost comes from span export — the async OkHttp reporter in zipkin-reporter-brave buffers spans and sends them in batches, adding minimal latency to request processing. However, if the Zipkin collector is unavailable and the buffer fills, the reporter will drop spans rather than block — this is the correct behavior for a non-critical observability path.
Rate limiting samplers (not built-in but easy to implement) provide a fixed number of traces per second regardless of traffic volume. At 1,000 RPS, a 10-traces-per-second rate limiter provides 1% sampling naturally; at 100 RPS, it provides 10%. This is more predictable than probability-based sampling for resource planning.
B3 vs W3C Propagation: Choosing the Right Header Format
Trace context propagation headers carry the traceId, spanId, and sampling decision across service boundaries. Two competing standards exist: B3 (Zipkin's original format) and W3C TraceContext (the IETF standard, adopted by OpenTelemetry). Choosing the wrong one for your ecosystem causes disconnected traces at service boundaries.
B3 propagation uses multiple headers: X-B3-TraceId (32 hex chars for 128-bit, or 16 for 64-bit), X-B3-SpanId (16 hex chars), X-B3-ParentSpanId (16 hex chars), and X-B3-Sampled (1 or 0). B3 also has a single-header variant: b3: {traceId}-{spanId}-{samplingState}-{parentSpanId}. B3 is the default for the Brave bridge (micrometer-tracing-bridge-brave).
W3C TraceContext uses two headers: traceparent (format: 00-{traceId}-{spanId}-{flags}) and tracestate (vendor-specific additional state). The traceparent header is standardized by the W3C and is the default for OpenTelemetry. W3C is the default for the OTEL bridge (micrometer-tracing-bridge-otel).
The practical implication: if all your services use the Brave bridge, B3 is the natural choice. If any service uses an OpenTelemetry SDK (perhaps a Node.js or Python service instrumented with OTEL), those services will emit W3C headers by default. A Brave-based Spring service receiving a W3C traceparent header will not propagate the trace ID unless configured to understand W3C headers.
You can configure Micrometer Tracing to support both B3 and W3C simultaneously using a composite propagation format. This is the most flexible configuration for polyglot environments. Incoming requests are checked for both header formats; outgoing requests send both.
Istio/Envoy service meshes use B3 headers by default for their distributed tracing integration. If you are using Istio and your Spring services use the OTEL bridge with W3C propagation, trace context will not be linked to Istio's Kiali service graph. Align your propagation format with your service mesh's configuration.
Zipkin UI Navigation and Trace Analysis
Zipkin's UI is the primary tool for trace investigation but its features are not always obvious. Understanding how to navigate it effectively reduces the time to root-cause a production issue from minutes to seconds.
The search page (/) accepts service name, span name, remote service name, annotation, tag key-value, duration (min/max), and time range. The most powerful filter combination for debugging is: service=payment-service, minDuration=1000ms (finding all traces slower than 1 second). This immediately surfaces the slow outliers without scrolling through hundreds of normal traces.
The trace detail page shows a waterfall diagram. The total trace duration is the span of time from the leftmost span's start to the rightmost span's end. Each bar represents a span. The colors indicate the service (Zipkin assigns colors consistently per service name). A thick gap between two adjacent spans represents network latency between parent calling the child service. A span that is mostly empty with a small child span means the parent spent most time waiting — look at what happened between the child returning and the parent span ending.
Annotations (small dots on spans) represent timestamped events. cs (client send), cr (client receive), ss (server send), sr (server receive) are the standard Brave annotations from HTTP calls. The gap between cs and sr is network latency; the gap between sr and ss is the server processing time; the gap between ss and cr is network latency back.
The dependency graph (/dependency) shows the service-to-service call graph derived from trace data, with call counts and error rates on each edge. This is invaluable for identifying which upstream service most frequently calls the slow one, helping prioritize optimization work.
For production debugging, combine Zipkin search with log correlation: find a slow trace in Zipkin, copy the traceId, then query your log aggregator (Kibana, CloudWatch Logs Insights, Loki) for traceId='...' to see every log statement associated with that specific request across all services. This provides the full picture: the timing from Zipkin and the context from logs.
The Invisible Span: Kafka Consumer Missing from Zipkin Traces
- Kafka tracing requires explicit opt-in via observation configuration on the container factory in older spring-kafka versions.
- Always verify end-to-end trace propagation through each transport type (HTTP, Kafka, RabbitMQ) in a staging environment after initial setup — missing spans appear as disconnected traces, not errors.
curl -s http://zipkin:9411/api/v2/services | jq .curl -s -X POST http://zipkin:9411/api/v2/spans -H 'Content-Type: application/json' -d '[{"traceId":"aabbccdd","id":"aabbccdd","name":"test","timestamp":1716400000000000,"duration":1000}]'Key takeaways
Common mistakes to avoid
7 patternsUsing spring-cloud-starter-sleuth with Spring Boot 3.x
Using new RestTemplate() instead of RestTemplateBuilder.build()
builder.build(). The builder is auto-configured with Micrometer Tracing interceptorsNot enabling Kafka listener observation
Leaving sampling.probability=1.0 in production
Mixing B3 and W3C propagation formats across services
Not copying MDC to async threads
Querying Zipkin with wrong service name casing
Interview Questions on This Topic
Why was Spring Cloud Sleuth deprecated and what replaces it in Spring Boot 3.x?
Frequently Asked Questions
That's Spring Cloud. Mark it forged?
10 min read · try the examples if you haven't