Senior 10 min · May 23, 2026

Distributed Tracing in Spring Boot 3.x: Micrometer Tracing + Zipkin

Master distributed tracing in Spring Boot 3.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide ⚙ Triage Commands
Quick Answer
  • Spring Boot 3.x uses Micrometer Tracing (not Spring Cloud Sleuth) with Zipkin via micrometer-tracing-bridge-brave or otel
  • TraceId and spanId are automatically propagated across Feign, WebClient, RestTemplate, and Kafka without code changes
  • Add %X{traceId} and %X{spanId} to your logback pattern to correlate logs with Zipkin traces
  • Set management.tracing.sampling.probability=1.0 for full sampling in dev; use 0.1 (10%) or lower in high-throughput production
  • Choose B3 propagation (default, Zipkin-native) or W3C TraceContext (OTEL standard) based on your ecosystem
✦ Definition~90s read
What is Distributed Tracing in Spring Boot 3.x?

Micrometer Tracing is the distributed tracing component of the Micrometer observability library, introduced as the successor to Spring Cloud Sleuth for Spring Boot 3.x. It provides a Tracer abstraction with two bridge implementations: micrometer-tracing-bridge-brave (wrapping Zipkin's Brave library) and micrometer-tracing-bridge-otel (wrapping OpenTelemetry SDK).

Imagine a restaurant where a customer order travels through the host, waiter, kitchen, and bar before arriving at the table.

The bridge pattern means application code uses Micrometer's Tracer/Span API while the bridge handles serialization format, propagation headers, and exporter protocol.

A trace represents a complete request journey across all participating services, identified by a traceId (128-bit hex string). Each unit of work within the trace is a span, identified by a spanId (64-bit hex string) and linked to its parent span via a parentSpanId.

Spans carry a start time, duration, service name, operation name, status (OK or ERROR), and arbitrary key-value tags. The collection of spans sharing a traceId forms a tree that Zipkin renders as a waterfall diagram.

Context propagation is the mechanism that threads a traceId through service boundaries. When service A calls service B over HTTP, the outgoing request includes B3 headers (X-B3-TraceId, X-B3-SpanId, X-B3-ParentSpanId, X-B3-Sampled) or W3C TraceContext headers (traceparent, tracestate).

Service B extracts these headers and creates a child span that continues the same trace. For message-based communication (Kafka, RabbitMQ), trace context is propagated via message headers using the same mechanism. Micrometer Tracing instruments all major Spring communication abstractions — WebClient, RestTemplate, Feign, Kafka — to perform this propagation automatically.

Plain-English First

Imagine a restaurant where a customer order travels through the host, waiter, kitchen, and bar before arriving at the table. Distributed tracing gives every person handling that order the same ticket number (traceId) and their own step number (spanId). Zipkin is the manager's dashboard that shows the full journey of every order — where it spent the most time, where it failed, and who handled it at each step — so you can find the bottleneck without interrogating every staff member individually.

A user reports that their checkout takes 8 seconds. Your payment microservice logs show no slow queries. Order service looks fine. Inventory service appears healthy. The request touches seven services — which one is actually slow? Without distributed tracing, answering this question means correlating timestamps across seven separate log files, praying that clocks are synchronized, and manually reconstructing the call graph. This is the problem that distributed tracing was built to solve.

Spring Cloud Sleuth was the go-to distributed tracing solution for Spring Boot 2.x. It automatically injected traceId and spanId into MDC, propagated trace context over HTTP headers, and exported spans to Zipkin with minimal configuration. But Sleuth is end-of-life as of Spring Boot 3.0 — it does not support the new Spring Framework 6 and Jakarta EE 10 baseline. The replacement is Micrometer Tracing, a vendor-neutral tracing facade that ships as part of the Micrometer observability stack.

Micrometer Tracing follows the same philosophy as Micrometer Metrics: one API, multiple backends. You write your code against the Micrometer Tracing API, and a bridge library translates to either OpenTelemetry (OTEL) or Brave (the Zipkin tracing library). This separation means you can switch from Zipkin to Jaeger, Grafana Tempo, or Honeycomb without changing application code — only the bridge dependency changes.

Auto-instrumentation in Spring Boot 3.x is comprehensive. Feign clients, WebClient, RestTemplate, Kafka producers/consumers, Spring MVC controllers, Spring WebFlux handlers, JDBC queries (via datasource-micrometer), and scheduled tasks all gain automatic span creation and context propagation with zero code changes. TraceId and spanId appear in MDC automatically, enabling log correlation without custom filters.

Sampling is the critical operational knob. Tracing every request at 1.0 probability generates enormous data volumes in production and adds measurable latency overhead from span serialization and network calls to the Zipkin collector. Configuring an appropriate sampling rate — and understanding tail-based vs head-based sampling — is essential knowledge for running tracing in production without impacting service performance or collector infrastructure costs.

This guide covers the complete Micrometer Tracing + Zipkin stack for Spring Boot 3.x: dependency configuration, MDC log integration, cross-service propagation mechanics, Feign and Kafka tracing, sampling strategies, Zipkin UI navigation, B3 vs W3C propagation, and the failure modes that trip up teams migrating from Sleuth.

Dependencies and Configuration: Migrating from Sleuth to Micrometer Tracing

The first step for any Spring Boot 3.x tracing setup is understanding the new dependency structure. Spring Cloud Sleuth is not compatible with Spring Boot 3.x — it depends on Spring Boot 2.x and Spring Framework 5. Any Sleuth dependency in a Spring Boot 3 project will cause classpath conflicts.

Micrometer Tracing is the direct replacement. The core tracing API is in micrometer-tracing. The backend bridge is either micrometer-tracing-bridge-brave (for Zipkin/Brave) or micrometer-tracing-bridge-otel (for OpenTelemetry). For Zipkin reporting, add zipkin-reporter-brave. Spring Boot 3.x provides auto-configuration for all of these through spring-boot-actuator — you just need the bridge and reporter on the classpath.

The spring-boot-starter-actuator must be present for the auto-configuration to activate. Spring Boot 3.x detects the tracing bridge on the classpath and automatically configures a Tracer, ObservationRegistry, and span export pipeline. The configuration properties moved from spring.sleuth. to management.tracing. and management.zipkin.tracing.*.

For projects using Spring Cloud's bill-of-materials, spring-cloud-starter-sleuth should be replaced with the Micrometer Tracing starters. The Spring Cloud 2022.x (Kilburn) and later releases do not include Sleuth — Micrometer Tracing is the standard tracing mechanism.

Micrometer Observations are the unified abstraction in Spring Boot 3.x. An Observation wraps a unit of work and automatically creates metrics (via Micrometer), traces (via Micrometer Tracing), and log correlation (via MDC). Spring's own framework instrumentation (MVC handlers, WebClient, Feign, Kafka) uses Observations internally, which is why auto-instrumentation works without code changes. Custom instrumentation should also use the Observation API rather than directly creating Spans for consistency.

Do Not Mix Sleuth and Micrometer Tracing
spring-cloud-starter-sleuth and micrometer-tracing-bridge-brave cannot coexist in the same application. If you are migrating a Spring Boot 2.x application to 3.x, remove all Sleuth dependencies first and replace them with the Micrometer Tracing equivalents.
Production Insight
The Spring Boot 3.x auto-configuration for Micrometer Tracing activates only when spring-boot-starter-actuator is present along with a tracing bridge. If traces are not appearing, check that actuator is not excluded from the classpath in production builds.
Key Takeaway
Spring Boot 3.x tracing requires micrometer-tracing-bridge-brave + zipkin-reporter-brave (or otel equivalents) alongside spring-boot-starter-actuator — remove Spring Cloud Sleuth entirely as it is incompatible with Spring Boot 3.x.

MDC Log Correlation: Linking Logs to Traces

Distributed tracing only delivers its full value when traces are correlated with logs. A Zipkin trace tells you which service was slow; the correlated logs tell you exactly what was happening inside that service at that moment. Micrometer Tracing automatically populates MDC (Mapped Diagnostic Context) with traceId, spanId, parentId, and sampled so any SLF4J-compatible logger can include them in log output.

The MDC keys in Spring Boot 3.x with Micrometer Tracing are: traceId (full 128-bit or 64-bit hex trace ID), spanId (current span ID), parentId (parent span ID, absent for root spans), and sampled (whether this trace is being exported to Zipkin). These replace Sleuth's X-B3-TraceId, X-B3-SpanId, X-B3-ParentSpanId MDC keys.

Configuring the log pattern is straightforward in Spring Boot: set logging.pattern.level or use a logback-spring.xml with %X{traceId} in the pattern. The key insight is that the MDC is populated for the duration of a span — if your code runs outside a span context (e.g., in a background thread that was not explicitly propagated), MDC values will be empty.

Thread propagation is the main MDC pitfall. When using @Async methods, CompletableFuture, or custom ExecutorService, MDC is not automatically copied to child threads. For @Async, the ContextPropagatingTaskDecorator from Micrometer Tracing wraps the task to copy the tracing context (and therefore MDC values) to the async thread. Configure this on your async executor.

For reactive WebFlux applications, MDC does not work with the traditional thread-local model because reactive chains may switch threads multiple times. Micrometer Tracing with WebFlux uses Reactor's Context for propagation. Spring Boot 3.x includes a Reactor Context-based MDC adapter that populates MDC when a subscriber processes a signal, but custom operators or blocking calls that escape the reactive context will not have MDC populated. Log statements inside Mono/Flux operators will have correct traceId values; log statements in manually spawned threads will not.

MDC Is Thread-Local: Async Threads Lose Trace Context
Code running in @Async methods, CompletableFuture.runAsync(), or custom thread pools will have empty MDC (no traceId in logs) unless ContextPropagatingTaskDecorator is configured on the executor. This is one of the most common log correlation failures.
Production Insight
In JSON-structured logging (Logstash encoder), MDC fields are automatically included in every log event as top-level JSON fields. This makes querying logs by traceId in Elasticsearch/Kibana or CloudWatch Logs Insights trivial: just filter on traceId = '0123abc...' to find all log events for a specific trace.
Key Takeaway
MDC traceId/spanId correlation bridges logs and traces — but it only works within the span's thread context. Always configure ContextPropagatingTaskDecorator for async executors and verify MDC propagation for each async mechanism your service uses.

Cross-Service Propagation: Feign, WebClient, and Kafka

Micrometer Tracing's value comes from seamless context propagation across service boundaries. The mechanism is different for each transport, and understanding how it works for each helps diagnose propagation failures.

For HTTP clients, Micrometer Tracing provides interceptors and filters that inject B3 or W3C headers into outgoing requests and extract them from incoming requests. For RestTemplate, configure a RestTemplateObservationInterceptor. Spring Boot 3.x auto-configures this when RestTemplate is created via RestTemplateBuilder. For WebClient, the auto-configured WebClientCustomizer adds a ClientRequestObservationConvention that instruments each request. For Feign clients, spring-cloud-starter-openfeign auto-configures tracing when Micrometer Tracing is on the classpath.

For Kafka, tracing requires spring-kafka 3.0+ with observation explicitly enabled. The KafkaTemplate is instrumented to add trace headers (b3 or traceparent) to ProducerRecords. On the consumer side, the KafkaListenerContainerFactory must have observation enabled so it extracts trace headers from ConsumerRecords and creates child spans. This creates a trace graph that spans across Kafka topics, showing the producer send as the parent of the consumer process.

For RabbitMQ (Spring AMQP), the RabbitTemplate and SimpleMessageListenerContainer are similarly instrumented via Micrometer Observations in Spring AMQP 3.0+. The same enablement pattern applies.

A common scenario is a gateway or API proxy that strips custom headers. If your Nginx, Envoy, or AWS ALB strips X-B3-* headers, downstream services receive requests without trace context and create new root traces. Configure your proxies to allow-list B3 or W3C propagation headers. For Envoy-based service meshes (Istio), Envoy propagates B3 headers but only to the next hop — your application code must still propagate them from incoming request to outgoing calls. Micrometer Tracing handles this automatically for instrumented clients.

Never Use new RestTemplate() in Spring Boot 3.x
Creating RestTemplate with new RestTemplate() bypasses Spring Boot's auto-configuration and will not have the Micrometer Tracing interceptor. Always inject RestTemplateBuilder and call builder.build() to get a properly instrumented instance.
Production Insight
Test end-to-end trace propagation through each transport type (HTTP, Kafka, RabbitMQ) by making a single request that traverses all of them, then looking up the traceId in Zipkin. If you see disconnected traces for the same traceId, you have a propagation gap at that transport boundary.
Key Takeaway
Context propagation is automatic for RestTemplateBuilder-created clients, WebClient.Builder-created clients, Feign, and Kafka (with observation-enabled=true) — but using new RestTemplate() or un-configured Kafka factories silently breaks propagation.

Sampling Strategies: Balancing Observability and Overhead

Sampling determines what fraction of requests generate traces. At management.tracing.sampling.probability=1.0, every request is traced. At 0.1, one in ten requests is traced. At 0.01, one in one hundred. The probability must balance observability needs against three costs: CPU overhead of span creation, network overhead of exporting spans to Zipkin, and storage costs in the Zipkin backend.

Head-based sampling (what Micrometer Tracing implements by default) makes the sampling decision at the trace root — typically the first service to receive a request. This decision is propagated downstream via the X-B3-Sampled header (B3) or the sampled flag in the traceparent header (W3C). All downstream services honor the sampling decision from the root, ensuring complete traces are either fully sampled or fully dropped — never partial.

The right sampling rate depends on request volume. For a service handling 1,000 RPS, a 10% sampling rate gives 100 traces per second — plenty for debugging. At 10,000 RPS, even 1% (100 traces/sec) may stress your Zipkin collector. At 100 RPS, consider 100% sampling for complete visibility. Rule of thumb: target 10-100 traces per second entering your Zipkin collector from the fleet as a whole.

Custom sampler beans give finer control. You can implement a Sampler that samples 100% of requests containing an X-Debug-Trace header (for on-demand tracing of specific users), 100% of requests that result in errors, and 5% of normal requests. This tail-informed head-based hybrid is a common production pattern: guaranteed error trace capture with reduced load from healthy request traffic.

For latency-sensitive services, measure tracing overhead empirically. Span creation is typically sub-microsecond. The cost comes from span export — the async OkHttp reporter in zipkin-reporter-brave buffers spans and sends them in batches, adding minimal latency to request processing. However, if the Zipkin collector is unavailable and the buffer fills, the reporter will drop spans rather than block — this is the correct behavior for a non-critical observability path.

Rate limiting samplers (not built-in but easy to implement) provide a fixed number of traces per second regardless of traffic volume. At 1,000 RPS, a 10-traces-per-second rate limiter provides 1% sampling naturally; at 100 RPS, it provides 10%. This is more predictable than probability-based sampling for resource planning.

Default Sampling Rate Is 10% in Spring Boot 3.x
Unlike Sleuth which defaulted to 10% sampling, Spring Boot 3.x also defaults to 10% (management.tracing.sampling.probability=0.1). In a low-traffic dev environment, you may trigger requests 5 times and never see a trace in Zipkin. Set 1.0 in dev environments.
Production Insight
Use 100% sampling for batch jobs, critical payment flows, and admin operations regardless of the global rate. Implement a SamplerFunction that checks request path or a custom header to bump specific routes to 1.0 probability.
Key Takeaway
Head-based sampling ensures complete traces are either fully captured or fully dropped — never partial. Set 100% in dev, tune production sampling based on target traces-per-second for your Zipkin infrastructure, and implement custom samplers for always-on error tracing.

B3 vs W3C Propagation: Choosing the Right Header Format

Trace context propagation headers carry the traceId, spanId, and sampling decision across service boundaries. Two competing standards exist: B3 (Zipkin's original format) and W3C TraceContext (the IETF standard, adopted by OpenTelemetry). Choosing the wrong one for your ecosystem causes disconnected traces at service boundaries.

B3 propagation uses multiple headers: X-B3-TraceId (32 hex chars for 128-bit, or 16 for 64-bit), X-B3-SpanId (16 hex chars), X-B3-ParentSpanId (16 hex chars), and X-B3-Sampled (1 or 0). B3 also has a single-header variant: b3: {traceId}-{spanId}-{samplingState}-{parentSpanId}. B3 is the default for the Brave bridge (micrometer-tracing-bridge-brave).

W3C TraceContext uses two headers: traceparent (format: 00-{traceId}-{spanId}-{flags}) and tracestate (vendor-specific additional state). The traceparent header is standardized by the W3C and is the default for OpenTelemetry. W3C is the default for the OTEL bridge (micrometer-tracing-bridge-otel).

The practical implication: if all your services use the Brave bridge, B3 is the natural choice. If any service uses an OpenTelemetry SDK (perhaps a Node.js or Python service instrumented with OTEL), those services will emit W3C headers by default. A Brave-based Spring service receiving a W3C traceparent header will not propagate the trace ID unless configured to understand W3C headers.

You can configure Micrometer Tracing to support both B3 and W3C simultaneously using a composite propagation format. This is the most flexible configuration for polyglot environments. Incoming requests are checked for both header formats; outgoing requests send both.

Istio/Envoy service meshes use B3 headers by default for their distributed tracing integration. If you are using Istio and your Spring services use the OTEL bridge with W3C propagation, trace context will not be linked to Istio's Kiali service graph. Align your propagation format with your service mesh's configuration.

Mismatched Propagation Formats Silently Break Trace Correlation
A service using B3 will not extract trace context from W3C traceparent headers and vice versa. The receiving service creates a new root trace, breaking the end-to-end trace chain. This is invisible in Zipkin — you just see two separate traces for what should be one request.
Production Insight
In a Kubernetes environment with Istio, use management.tracing.propagation.consume=[B3, W3C] and produce=[B3, W3C] on all services. This handles both Istio's B3 headers and any OTEL-instrumented services emitting W3C headers without needing to coordinate a fleet-wide propagation format migration.
Key Takeaway
B3 is Zipkin/Brave native; W3C is OpenTelemetry/IETF standard. Use the composite consume+produce configuration in polyglot environments to accept both formats, and align with your service mesh's tracing configuration.

Zipkin UI Navigation and Trace Analysis

Zipkin's UI is the primary tool for trace investigation but its features are not always obvious. Understanding how to navigate it effectively reduces the time to root-cause a production issue from minutes to seconds.

The search page (/) accepts service name, span name, remote service name, annotation, tag key-value, duration (min/max), and time range. The most powerful filter combination for debugging is: service=payment-service, minDuration=1000ms (finding all traces slower than 1 second). This immediately surfaces the slow outliers without scrolling through hundreds of normal traces.

The trace detail page shows a waterfall diagram. The total trace duration is the span of time from the leftmost span's start to the rightmost span's end. Each bar represents a span. The colors indicate the service (Zipkin assigns colors consistently per service name). A thick gap between two adjacent spans represents network latency between parent calling the child service. A span that is mostly empty with a small child span means the parent spent most time waiting — look at what happened between the child returning and the parent span ending.

Annotations (small dots on spans) represent timestamped events. cs (client send), cr (client receive), ss (server send), sr (server receive) are the standard Brave annotations from HTTP calls. The gap between cs and sr is network latency; the gap between sr and ss is the server processing time; the gap between ss and cr is network latency back.

The dependency graph (/dependency) shows the service-to-service call graph derived from trace data, with call counts and error rates on each edge. This is invaluable for identifying which upstream service most frequently calls the slow one, helping prioritize optimization work.

For production debugging, combine Zipkin search with log correlation: find a slow trace in Zipkin, copy the traceId, then query your log aggregator (Kibana, CloudWatch Logs Insights, Loki) for traceId='...' to see every log statement associated with that specific request across all services. This provides the full picture: the timing from Zipkin and the context from logs.

High-Cardinality Tags Can Overwhelm Zipkin Storage
Tags like userId, orderId, or sessionId are high-cardinality (many unique values). Adding them to every span can exhaust Zipkin's storage and slow down search queries. Use highCardinalityKeyValue for values needed for search but mark them appropriately, and configure Zipkin's storage backend (Elasticsearch/Cassandra) with appropriate retention policies.
Production Insight
Set up a Zipkin alert (or query in your log aggregator) that fires when error-tagged traces exceed a threshold. Query Zipkin's API hourly for traces with error annotations and push counts to your alerting system if you are not using a commercial APM tool.
Key Takeaway
Zipkin's minDuration filter + error annotation search are your two most powerful debugging tools. Correlate Zipkin traceId with log aggregator queries to combine timing data from Zipkin with contextual data from logs for complete root-cause analysis.
● Production incidentPOST-MORTEMseverity: high

The Invisible Span: Kafka Consumer Missing from Zipkin Traces

Symptom
In Zipkin, the order-service trace ended at the Kafka producer send. A completely separate trace appeared for fulfillment-service processing the same order message, with no parent-child relationship between them. The 4-second gap was invisible in Zipkin's dependency graph, making it impossible to trace an end-to-end order flow.
Assumption
The team assumed Micrometer Tracing automatically propagated trace context to Kafka messages the same way it did for HTTP calls, since Kafka support was listed in the documentation.
Root cause
The Kafka consumer was using a manually created @KafkaListener with a KafkaListenerContainerFactory that was not configured with the Micrometer observation integration. The spring-kafka version in use (2.9.x) required explicitly enabling observation: containerFactory.getContainerProperties().setObservationEnabled(true). Without this, the consumer created new root spans instead of child spans of the producer's trace.
Fix
Added observationEnabled=true to the KafkaListenerContainerFactory bean and upgraded to spring-kafka 3.0+ which enables observation by default when Micrometer Tracing is on the classpath. Also set spring.kafka.listener.observation-enabled=true in application.yml for auto-configured factories. After the fix, Zipkin showed the complete order flow from HTTP ingress through Kafka to fulfillment processing as a single trace.
Key lesson
  • Kafka tracing requires explicit opt-in via observation configuration on the container factory in older spring-kafka versions.
  • Always verify end-to-end trace propagation through each transport type (HTTP, Kafka, RabbitMQ) in a staging environment after initial setup — missing spans appear as disconnected traces, not errors.
Production debug guideSymptom → root cause → fix6 entries
Symptom · 01
No traces appear in Zipkin UI despite application receiving requests
Fix
First verify Zipkin is reachable from the application: check management.zipkin.tracing.endpoint is correctly set (default: http://localhost:9411/api/v2/spans). Confirm management.tracing.sampling.probability is greater than 0.0 — the default in Spring Boot 3.x is 0.1 (10% sampling), so low-traffic testing may result in no sampled requests. Set to 1.0 for debugging. Enable DEBUG logging for io.micrometer.tracing to see span creation and export events. Verify the micrometer-tracing-bridge-brave and zipkin-reporter-brave dependencies are both on the classpath.
Symptom · 02
traceId appears in logs but not in Zipkin for the same request
Fix
The trace is being created locally but not exported. This usually means the Zipkin reporter is failing silently. Enable DEBUG for zipkin2.reporter to see export failures. Check if the application can reach Zipkin's API endpoint with curl http://zipkin:9411/api/v2/spans. Verify there are no firewall rules blocking port 9411. Check if the Spring Boot app is being shut down before the async reporter flushes — Zipkin's OkHttp reporter batches spans and may lose the last batch on ungraceful shutdown; configure spring.lifecycle.timeout-per-shutdown-phase to allow flush time.
Symptom · 03
Spans from different services appear in Zipkin but not connected in the same trace
Fix
Context propagation is failing between services. Verify both services use the same propagation format: either B3 (default for Brave bridge) or W3C (default for OTEL bridge). If mixing bridges, configure explicit format: management.tracing.propagation.type=B3 on both. For Feign clients, verify spring-cloud-starter-openfeign includes the micrometer auto-instrumentation. For Kafka, ensure spring.kafka.listener.observation-enabled=true. Use curl -H 'X-B3-TraceId: <fixed-id>' to test manual propagation and see if downstream services include it in their spans.
Symptom · 04
MDC traceId is null in log output even though tracing is configured
Fix
The logging pattern must include %X{traceId} and %X{spanId}. Verify your logback-spring.xml or logback.xml uses these MDC keys. In Spring Boot 3.x with Micrometer Tracing, the MDC keys are traceId, spanId, parentId, and sampled — check your pattern uses exactly these names. If using async logging with AsyncAppender, MDC is not automatically copied to async threads — use MDCInsertingServletFilter or configure logback's MDC copying. For reactive WebFlux applications, MDC does not propagate automatically through reactive chains; use Reactor's contextWrite and MDC.put patterns.
Symptom · 05
Sampling rate set to 1.0 but only some requests appear in Zipkin
Fix
If an upstream service (API gateway, load balancer) sends X-B3-Sampled: 0, the downstream service honors that sampling decision and will not export the span regardless of local sampling configuration. This is correct behavior — sampling decisions are made at the trace root and propagated. Check your API gateway's sampling configuration. If you need to override upstream sampling decisions (not recommended), you can implement a custom Sampler bean that ignores incoming sampling flags.
Symptom · 06
Zipkin shows extremely high latency for a service but logs show fast response times
Fix
Check if clock skew exists between the services. Zipkin's waterfall view depends on accurate clock synchronization (NTP). A service with a clock 500ms ahead will appear to take 500ms longer. Verify NTP configuration on all nodes. Also check if the Zipkin span is capturing database or external call time that the application logs do not include. The Zipkin span start/end may correctly represent wall-clock time including connection pool wait time that does not appear in application-level log statements.
★ Debug Cheat SheetShell commands for diagnosing Micrometer Tracing and Zipkin issues
Traces not reaching Zipkin
Immediate action
Test Zipkin API reachability and verify app exports spans
Commands
curl -s http://zipkin:9411/api/v2/services | jq .
curl -s -X POST http://zipkin:9411/api/v2/spans -H 'Content-Type: application/json' -d '[{"traceId":"aabbccdd","id":"aabbccdd","name":"test","timestamp":1716400000000000,"duration":1000}]'
Fix now
Set management.tracing.sampling.probability=1.0 and management.zipkin.tracing.endpoint=http://zipkin:9411/api/v2/spans in application.yml
Missing traceId in application logs+
Immediate action
Check logback pattern and MDC keys
Commands
curl -s http://my-service:8080/actuator/loggers/io.micrometer.tracing | jq .
kubectl logs my-service-pod | grep -E 'traceId|spanId|TraceContext' | head -20
Fix now
Add %X{traceId} %X{spanId} to logback pattern in logback-spring.xml or logging.pattern.level=%5p [%X{traceId},%X{spanId}]
Disconnected traces between services (context propagation failure)+
Immediate action
Verify propagation headers are being sent by the calling service
Commands
curl -v -H 'X-B3-TraceId: 0123456789abcdef0123456789abcdef' http://my-service:8080/api/test 2>&1 | grep -i 'b3\|trace\|span'
curl -s 'http://zipkin:9411/api/v2/traces?serviceName=my-service&limit=5' | jq '.[0][].traceId'
Fix now
Ensure both services use same propagation format: management.tracing.propagation.type=B3 (or W3C) on all services
Kafka spans not showing in traces+
Immediate action
Check Kafka listener observation configuration
Commands
kubectl exec -it my-pod -- curl -s localhost:8080/actuator/beans | jq '.beans | to_entries[] | select(.value.type | contains("KafkaListenerContainer"))'
grep -r 'observationEnabled\|observation-enabled' /app/config/
Fix now
Set spring.kafka.listener.observation-enabled=true in application.yml and ensure spring-kafka >= 3.0.0 is used
Tracing Solution Comparison: Spring Boot 2.x vs 3.x
AspectSpring Cloud Sleuth (Boot 2.x)Micrometer Tracing (Boot 3.x)
Dependencyspring-cloud-starter-sleuthmicrometer-tracing-bridge-brave + zipkin-reporter-brave
Config prefixspring.sleuth.*management.tracing. + management.zipkin.tracing.
BootstrapAuto (spring-cloud)Auto (spring-boot-actuator + bridge on classpath)
Feign supportAuto with spring-cloud-openfeignAuto with spring-cloud-openfeign 4.x
Kafka tracingAuto with spring-kafkaRequires spring.kafka.listener.observation-enabled=true
MDC keysX-B3-TraceId, X-B3-SpanIdtraceId, spanId, parentId
Default sampling10% (0.1)10% (0.1)
Backend optionsZipkin, Jaeger (via reporter)Any (Zipkin, Jaeger, OTEL Collector, Grafana Tempo)
APIBrave Tracer directlyMicrometer Observation + Tracer abstraction
Spring Boot 3.x compatibleNoYes

Key takeaways

1
Spring Boot 3.x uses Micrometer Tracing (micrometer-tracing-bridge-brave + zipkin-reporter-brave), not Spring Cloud Sleuth
remove Sleuth entirely as it is incompatible with Spring Boot 3.x
2
TraceId/spanId propagate automatically across Feign, WebClient.Builder, and RestTemplateBuilder
but only if using the Spring-configured builder instances, not manually constructed clients
3
Kafka tracing requires explicit opt-in
spring.kafka.listener.observation-enabled=true and spring.kafka.template.observation-enabled=true — without these, Kafka traces appear as disconnected root traces
4
MDC traceId/spanId correlation enables log-to-trace linking but is thread-local
configure ContextPropagatingTaskDecorator on async executors to propagate context to background threads
5
Default sampling is 10% in both Sleuth and Micrometer Tracing
set 1.0 in dev, tune production to 1-10% based on target trace volume, and implement custom samplers for always-on error tracing
6
B3 and W3C are mutually incompatible propagation formats
use composite consume=[B3, W3C] configuration in polyglot environments to prevent silent trace correlation breaks at service boundaries

Common mistakes to avoid

7 patterns
×

Using spring-cloud-starter-sleuth with Spring Boot 3.x

Symptom
Application fails to start with classpath conflicts, NoSuchMethodError, or IncompatibleClassChangeError related to Spring Framework 6 / Jakarta EE 10 changes
Fix
Remove spring-cloud-starter-sleuth entirely and add micrometer-tracing-bridge-brave + zipkin-reporter-brave instead
×

Using new RestTemplate() instead of RestTemplateBuilder.build()

Symptom
HTTP calls between services do not propagate traceId — downstream service starts a new root trace instead of a child span
Fix
Always inject RestTemplateBuilder and call builder.build(). The builder is auto-configured with Micrometer Tracing interceptors
×

Not enabling Kafka listener observation

Symptom
Kafka consumer spans appear as disconnected root traces in Zipkin instead of child spans of the producer's trace
Fix
Set spring.kafka.listener.observation-enabled=true and spring.kafka.template.observation-enabled=true in application.yml
×

Leaving sampling.probability=1.0 in production

Symptom
Zipkin collector overwhelmed with span data, high memory usage on application pods from span buffering, Zipkin search slows to minutes per query
Fix
Set management.tracing.sampling.probability=0.05 to 0.1 in production; use custom sampler to always-sample errors and debug requests
×

Mixing B3 and W3C propagation formats across services

Symptom
End-to-end traces are broken at service boundaries between services using different propagation formats — two separate traces appear in Zipkin for one request
Fix
Standardize on one format fleet-wide, or use management.tracing.propagation.consume=[B3, W3C] and produce=[B3] to accept both and emit B3
×

Not copying MDC to async threads

Symptom
Log statements inside @Async methods or CompletableFuture chains show empty traceId fields, breaking log correlation for async operations
Fix
Configure ContextPropagatingTaskDecorator on all custom ExecutorService and @Async executor beans
×

Querying Zipkin with wrong service name casing

Symptom
Zipkin search returns no results for a service that is definitely producing traces
Fix
Zipkin service names are derived from spring.application.name. The name is case-sensitive in Zipkin UI. Use GET /api/v2/services to see the exact registered service names
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01JUNIOR
Why was Spring Cloud Sleuth deprecated and what replaces it in Spring Bo...
Q02SENIOR
Explain how traceId propagates from a Spring MVC service to a downstream...
Q03SENIOR
What is the difference between head-based and tail-based sampling, and w...
Q04SENIOR
How do you propagate trace context to Kafka consumers in Spring Boot 3.x...
Q05SENIOR
A request generates a traceId that appears in logs but never shows up in...
Q06JUNIOR
What MDC keys does Micrometer Tracing populate, and how do you include t...
Q07SENIOR
How do B3 and W3C TraceContext propagation formats differ, and when woul...
Q08SENIOR
How do you ensure trace context propagates correctly in a reactive WebFl...
Q09SENIOR
What is the recommended sampling rate for a high-throughput production s...
Q01 of 09JUNIOR

Why was Spring Cloud Sleuth deprecated and what replaces it in Spring Boot 3.x?

ANSWER
Spring Cloud Sleuth was built on Spring Boot 2.x and Spring Framework 5 internals and was not compatible with the Jakarta EE 10 / Spring Framework 6 foundation of Spring Boot 3.x. Micrometer Tracing is the replacement — it is part of the Micrometer observability stack and provides a vendor-neutral Tracer API with bridge implementations for both Brave (Zipkin) and OpenTelemetry. The configuration prefix moved from spring.sleuth. to management.tracing., and MDC keys changed from X-B3-TraceId to traceId.
FAQ · 7 QUESTIONS

Frequently Asked Questions

01
Is Spring Cloud Sleuth still supported for Spring Boot 2.x projects?
02
Can I use Jaeger or Grafana Tempo instead of Zipkin?
03
How much latency does distributed tracing add to each request?
04
What happens to traces if Zipkin is unavailable?
05
How do I add custom business data to spans for easier searching in Zipkin?
06
How do I test that trace propagation works correctly in unit and integration tests?
07
What is the Observation API and how does it relate to tracing?
🔥

That's Spring Cloud. Mark it forged?

10 min read · try the examples if you haven't

Previous
Centralized Config with Spring Cloud Config
6 / 8 · Spring Cloud
Next
Load Balancing with Spring Cloud LoadBalancer