Senior 4 min · March 17, 2026

Distributed Tracing with Jaeger

Jaeger Missing Spans — Async Context Propagation Fix

Q: What is the difference between distributed tracing, logging, and metrics?

Logs are time-stamped text events from a single service. Metrics are aggregated numerical measurements (request rate, error rate, latency percentiles). Distributed traces show the causal chain of events across services for a single request. Observability requires all three: metrics to know something is wrong, logs to see what happened, traces to find where.

Q: What is sampling in distributed tracing?

Recording every trace at high traffic volumes is expensive. Sampling records only a fraction of traces — head-based sampling decides at the start of a request (simple, misses tail latency). Tail-based sampling decides after the trace completes, keeping slow or error traces — more accurate but requires buffering. Jaeger supports both. Common approach: sample 1-5% of normal traces, always sample errors.

Q: Can I use Jaeger without OpenTelemetry?

Yes, Jaeger supports its own SDKs (Jaeger client libraries) directly. However, OpenTelemetry is the industry standard and recommended because it allows switching backends (e.g., to Zipkin or Datadog) without changing instrumentation. With Jaeger clients you're locked in.

Q: How do I persist Jaeger traces?

Jaeger supports multiple storage backends: Elasticsearch, Cassandra, and Kafka (as intermediate). In production, set SPAN_STORAGE_TYPE=elasticsearch and configure ES connection. The all-in-one image uses in-memory storage — data is lost on restart.

Q: What is the overhead of enabling distributed tracing?

Depends on sampling rate and instrumentation depth. Auto-instrumentation adds <1ms per HTTP request. Manual spans add a few microseconds each (span creation, attribute setting). The larger overhead is network: exporting spans requires a TCP connection to the collector. Use batching (BatchSpanProcessor) to amortise cost. At 1% sampling, overhead is negligible.

Kafka consumers showing separate trace IDs? Raw client libraries skip traceparent headers.

Naren Founder & Principal Engineer

20+ years shipping production infrastructure and CI/CD at scale. Lessons pulled from things that broke in production.

✓ Production

production tested

June 10, 2026

last updated

1,554

articles · all by Naren

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

A trace tracks one request across multiple services; a span is a single operation within a service.
Jaeger is an open-source CNCF-graduated tracing backend that stores, indexes, and visualises traces.
Instrument code with OpenTelemetry, export via OTLP (port 4317/4318), and view traces at Jaeger UI.
Context propagation via W3C traceparent header is critical for cross-service visibility.
Sampling (head-based or tail-based) trades accuracy for storage cost — always sample errors 100%.

✦ Definition~90s read

What is Distributed Tracing with Jaeger?

Distributed tracing with Jaeger lets you follow a single request as it hops across microservices, databases, and queues. Each unit of work is a span; a chain of spans forms a trace. The core mechanism is context propagation — you must manually pass trace metadata (trace ID, span ID) across service boundaries via HTTP headers, message envelopes, or gRPC metadata.

★

Imagine tracking a package through multiple delivery trucks, but each driver starts a new tracking number instead of passing along the original one.

When that handoff fails, spans become orphans: they exist in Jaeger but belong to no parent, or the entire trace breaks into disconnected fragments. This is the single most common cause of missing spans in production, not sampling or network issues. OpenTelemetry is the modern standard for instrumentation, replacing Jaeger's native clients.

It handles context propagation automatically for popular frameworks like FastAPI, but only if you configure the propagator correctly — typically W3C TraceContext or Jaeger's own format. Running Jaeger locally with Docker is trivial (docker run -p 16686:16686 jaegertracing/all-in-one:latest), but production setups require careful sampling strategy decisions: head-based sampling (e.g., probabilistic 1%) is simple but can miss rare errors; tail-based sampling (e.g., Jaeger's own or OpenTelemetry Collector's) preserves complete traces for problematic requests.

When spans go missing, first check that every service uses the same propagator, then verify that async tasks (thread pools, background workers, Celery) explicitly propagate context — Python's contextvars and asyncio don't do this automatically. The fix is almost always a missing with tracer.start_as_current_span() or a forgotten propagator.inject() in a custom middleware.

Plain-English First

Imagine tracking a package through multiple delivery trucks, but each driver starts a new tracking number instead of passing along the original one. The customer sees several disconnected packages instead of one continuous journey. Fixing context propagation means making sure every driver writes down the original tracking number before handing off the box.

Missing spans in Jaeger are almost always caused by broken context propagation across async boundaries, not sampling or network failures. When trace IDs fail to cross thread pools, message queues, or background workers, traces fragment into orphaned spans that hide critical latency bottlenecks. This article shows how to diagnose and fix async context propagation using OpenTelemetry's propagator API.

How Distributed Tracing with Jaeger Works — and Why Spans Go Missing

Distributed tracing with Jaeger tracks a single request as it propagates through microservices by assigning a unique trace ID and attaching spans — each span representing a unit of work with start time, duration, and metadata. The core mechanic is context propagation: the trace ID must be passed across service boundaries via HTTP headers (or message queue metadata) so that spans from different services can be stitched into one trace. Without correct propagation, spans become orphaned and the trace is incomplete.

Jaeger stores traces in a backend (Cassandra, Elasticsearch, or Kafka) and exposes them via a UI. In practice, each service must extract the incoming trace context, create child spans, and inject the context into outgoing requests. This is typically done with OpenTelemetry SDKs, which handle serialization and deserialization of trace context. The key property that matters: if any service in the chain fails to propagate context — due to async boundaries, thread pool switches, or manual HTTP clients — the trace breaks at that point.

Use Jaeger when you need to debug latency spikes, identify service dependencies, or trace errors across more than three services. In production systems handling thousands of requests per second, a single missing span can hide a 500ms bottleneck in a downstream service. Without tracing, you're debugging blind — logs give you local state, but only traces show the full causal chain.

Async Context Is Not Automatic

Java's CompletableFuture, ExecutorService, and reactive streams do not propagate trace context by default — you must manually pass it or use OpenTelemetry's context propagation wrappers.

Production Insight

A payment service using ExecutorService for parallel validation calls lost trace context on the thread pool boundary, causing all downstream spans to appear as root spans.

Symptom: Jaeger UI shows multiple disconnected traces for a single checkout request, each with only one or two spans.

Rule: Always wrap thread pools with OpenTelemetry's ContextExecutors or use @WithSpan on async methods to ensure context flows across threads.

Key Takeaway

Distributed tracing is only as reliable as your context propagation — one missed header breaks the entire trace.

Async boundaries (thread pools, reactive streams, message queues) are the most common source of missing spans in Java.

Always validate trace continuity in staging with a known bad path before relying on traces in production debugging.

thecodeforge.io

Jaeger Missing Spans: Async Context Propagation Fix

Distributed Tracing Jaeger

Instrumenting a FastAPI App with OpenTelemetry

ExamplePYTHON

# pip install opentelemetry-distro opentelemetry-exporter-otlp
# pip install opentelemetry-instrumentation-fastapi

from fastapi import FastAPI
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor

# Configure tracer
provider = TracerProvider()
exporter = OTLPSpanExporter(endpoint='http://jaeger:4317')  # Jaeger OTLP endpoint
provider.add_span_processor(BatchSpanProcessor(exporter))
trace.set_tracer_provider(provider)

app = FastAPI()
FastAPIInstrumentor.instrument_app(app)  # auto-instruments all routes

tracer = trace.get_tracer(__name__)

@app.get('/orders/{order_id}')
async def get_order(order_id: int):
    with tracer.start_as_current_span('fetch-order') as span:
        span.set_attribute('order.id', order_id)

        # Manual span for a specific operation
        with tracer.start_as_current_span('db-query'):
            order = await db.get_order(order_id)

        with tracer.start_as_current_span('enrich-order'):
            user = await user_service.get_user(order.user_id)  # cross-service call

        return {'order': order

Output

# Traces exported to Jaeger — visible in Jaeger UI at http://jaeger:16686

Production Insight

Auto-instrumentation only covers framework-level spans.

Manual spans for I/O, locks, or business logic are where 90% of latency hides.

Rule: if you only rely on auto-instrumentation, you'll miss the root cause every time.

Key Takeaway

FastAPIInstrumentor handles HTTP routes automatically.

Manual spans give you visibility into the operations that matter most.

Always wrap external calls and critical logic in custom spans.

Running Jaeger with Docker

ExampleBASH

# Run Jaeger all-in-one (development setup)
docker run -d \n  --name jaeger \n  -p 16686:16686 \n  -p 4317:4317 \n  -p 4318:4318 \n  jaegertracing/all-in-one:latest

# Ports:
# 16686 — Jaeger UI
# 4317  — OTLP gRPC receiver
# 4318  — OTLP HTTP receiver

# Open Jaeger UI: http://localhost:16686
# Search by service name → see all traces
# Click a trace → see full span timeline
# Click a span → see attributes, events, errors

Output

# Jaeger UI at http://localhost:16686

Production Insight

The all-in-one image bundles storage, collector, and query into one process.

It's fine for dev but loses all traces on restart — use Elasticsearch or Cassandra in prod.

Rule: never use all-in-one for production; you won't have trace persistence.

Key Takeaway

Docker run gets you started in 30 seconds.

Port 16686 = UI, 4317 = OTLP gRPC, 4318 = OTLP HTTP.

Lossy dev mode — plan for persistent storage before you go live.

Understanding Spans, Traces, and Context Propagation

A span is a named, timed operation that carries a span ID, trace ID, parent span ID, and attributes. The entire set of spans linked by a common trace ID forms a trace. Context propagation is what connects spans across service boundaries — it passes the trace ID and parent span ID via HTTP headers (W3C Trace Context: traceparent and tracestate). Without propagation, each service creates a separate trace, and you lose the end-to-end view. Propagation is automatic when using OpenTelemetry instrumentation libraries (they inject headers on outgoing requests). If you use raw HTTP clients or message queues, you must manually inject and extract the context.

propagation_example.pyPYTHON

import requests
from opentelemetry import propagators, trace
from opentelemetry.propagate import inject, extract

# Sending service
tracer = trace.get_tracer(__name__)
with tracer.start_as_current_span('outgoing-call') as span:
    headers = {}
    inject(headers)  # injects traceparent from current span
    response = requests.get('http://payment-service/process', headers=headers)

# Receiving service
# In middleware or at route entry, extract context
from opentelemetry.propagate import extract
ctx = extract(request.headers)
with tracer.start_as_current_span('process-payment', context=ctx) as span:
    # this span is now child of the sending service's span
    pass

Manual propagation is a common trap

If you're using HTTP clients not covered by auto-instrumentation (e.g., raw requests or httpx without integration), you must inject and extract headers manually. The W3C traceparent header is the standard — don't invent your own.

Production Insight

Missing context propagation is the #1 reason traces break at service boundaries.

Check for the traceparent header in your incoming requests to verify propagation.

Rule: if you see a new trace ID after a cross-service call, propagation is broken.

Key Takeaway

Spans get connected via trace ID propagated across services.

If your trace is broken into pieces, context propagation is the culprit.

Inject headers on outgoing calls; extract on incoming — always verify.

Propagation method decision

IfUsing an auto-instrumented library (FastAPIInstrumentor, requests integration)

→

UsePropagation is automatic — no extra code needed.

IfUsing a custom HTTP client or non-HTTP transport (Redis, Kafka)

→

UseYou must manually inject/export context using OpenTelemetry API.

Sampling Strategies in Production

Recording every trace at production scale is expensive — both in storage and network bandwidth. Sampling decides which traces to keep. Head-based sampling makes the decision at the start of a request (e.g., keep 1% of all traces). It's simple but can miss rare high-latency events. Tail-based sampling buffers traces and decides after they complete, keeping those that exceed a latency threshold or contain errors. Jaeger supports both. A common hybrid approach: sample 1–5% of normal requests and always sample requests with HTTP 5xx or custom error attributes.

Setting the right sampling rate is a trade-off. Too low (0.1%) and you'll miss most issues; too high (100%) for high-throughput services will overwhelm storage. Start at 1% and adjust based on storage budget and trace usefulness.

jaeger-sampling-config.yamlYAML

# Remote sampling configuration for Jaeger
service_config:
  - service: "order-service"
    operation: "/orders/{id}"
    probabilistic_sampling:
      sampling_rate: 0.01  # 1% sample
  - service: "order-service"
    operation: "*"
    probabilistic_sampling:
      sampling_rate: 0.005  # 0.5% for other ops
  - service: "payment-service"
    operation: "*"
    probabilistic_sampling:
      sampling_rate: 0.05  # 5% for payment (higher risk)

Watch out for sampling latency bias

Head-based sampling can introduce bias against slow requests because it makes a decision before the request completes. If your sampling rate is 1%, you miss 99% of slow responses. Tail-based sampling solves this but requires buffering and adds memory overhead.

Production Insight

Setting sampling per-operation is key: payment services need higher rate than health checks.

Use remote sampling configuration (Jaeger Collector) to change rates without redeploying.

Rule: always sample error spans at 100% regardless of overall rate — use sampler type 'const: true' for errors.

Key Takeaway

Head-based sampling is simple but can miss slow requests.

Tail-based sampling captures the long tail but costs more.

Best practice: 100% for errors, 1–5% for normal traffic per service tier.

Troubleshooting Missing Spans and Broken Traces

When a trace doesn't appear in Jaeger UI, or appears incomplete, the root cause is almost always one of: (1) context not propagated, (2) spans not exported, (3) sampling dropped the trace, (4) clock skew between service hosts. Use the following systematic checks. First, confirm you're hitting the Jaeger endpoint by looking at application logs for OTLP export errors. Second, check the trace ID uniformity — if each service generates its own trace ID, propagation is missing. Third, verify that the "Trace" view shows all expected spans — missing spans may indicate a failing exporter or network issue. Fourth, if spans from different services appear with wrong timing, check NTP synchronisation: Jaeger relies on span timestamps for ordering.

Think of traces as breadcrumbs

Each service drops its breadcrumb (span) and passes the trace ID onward.
If the breadcrumb is missing (span not created), the chain breaks.
If the trace ID is not passed (propagation failure), the chain splits into separate chains.
Your debugging goal: find the first service where the breadcrumb pattern changes.

Production Insight

Clock skew of even 100ms can cause spans to appear out of order in the UI.

Run NTP on all nodes and monitor drift — Jaeger has a built-in clock skew adjustment but it's not perfect.

Rule: if a trace's spans jump backwards in time, check NTP first.

Key Takeaway

Missing traces = propagation fail or sampling drop.

Incomplete traces = missing spans (exporter error or code bug).

Out-of-order spans = clock skew — NTP is not optional.

Integrating Traces with Logs and Metrics

Distributed tracing alone doesn't replace logs or metrics — it complements them. The true power emerges when you correlate trace IDs with log entries and metric events. OpenTelemetry enables this via trace_id injection into log records (MDC in Java, structlog in Python). Metric tools like Prometheus can use trace IDs in labels for targeted alerting. Jaeger's UI allows you to drill from a trace to related logs if you configure the log integration.

A common production pattern: when a latency alert fires, grab the trace ID from the affected request, open Jaeger to see the breakdown, then jump to the logs from that span ID to inspect the exact error message.

log_correlation.pyPYTHON

import structlog
from opentelemetry import trace

span = trace.get_current_span()
trace_id = format(span.get_span_context().trace_id, '032x')
span_id = format(span.get_span_context().span_id, '016x')

# Inject trace context into log
logger = structlog.get_logger()
logger.info("payment processed", trace_id=trace_id, span_id=span_id, order_id=123)

Unify your observability data

Use OpenTelemetry Collector to export traces to Jaeger, metrics to Prometheus, and logs to Loki. Set up a Grafana dashboard that links metrics panels to trace exploration — this is the 'observability pyramid' in practice.

Production Insight

Correlation is worthless if trace IDs aren't in logs from the start.

Instrument your logging layer early — retrofitting trace IDs into a million log lines is painful.

Rule: enforce trace_id presence in all structured logs via pipeline linting.

Key Takeaway

Traces show where; logs show what; metrics show when.

Correlate them via trace IDs in log output and metric labels.

Without correlation, you're debugging blind.

Why Your Traces Are Silent: The gRPC vs HTTP Exporter Trap

Most beginners copy-paste a Jaeger exporter configuration and wonder why their traces never show up. The culprit is almost always the gRPC endpoint. Jaeger all-in-one runs three separate ports: 14250 for gRPC, 14268 for HTTP Thrift, and 9411 for Zipkin. If you use JaegerExporter (gRPC) but your Jaeger container isn't listening on 14250, your spans vanish into the void. I've debugged this in three separate microservice migrations. The fix is brutally simple: match your exporter to Jaeger's open port. For HTTP, use ThriftExporter. For gRPC, ensure COLLECTOR_GRPC_PORT is set. Don't assume both endpoints are active—check your docker logs. This mismatch wastes hours for teams that could be shipping features.

exporter_check.pyPYTHON

from opentelemetry.exporter.jaeger.thrift import JaegerExporter as ThriftExporter
from opentelemetry.exporter.jaeger.proto.grpc import JaegerExporter as GrpcExporter

# HTTP exporter (port 14268)
http_exporter = ThriftExporter(
    agent_host_name="localhost",
    agent_port=6831,
)

# gRPC exporter (port 14250)
grpc_exporter = GrpcExporter(
    collector_endpoint="http://localhost:14250",
    insecure=True
)

print("HTTP exporter configured on port 6831")
print("gRPC exporter configured on port 14250")

Output

HTTP exporter configured on port 6831

gRPC exporter configured on port 14250

Production Trap:

Kubernetes often exposes only one port via ingress. If you expose 14268 but ship gRPC traces, you'll get 302 redirects or timeouts. Always validate connectivity with curl -v telnet://jaeger:14250 before blaming your code.

Key Takeaway

Match your Jaeger exporter protocol to the open collector port—gRPC and HTTP are not interchangeable.

Sampling in Production: Don't Bankrupt Your Storage on Every Request

In development, trace every request. In production, that costs real money—storage, network, and CPU. Smart teams use head-based sampling to keep the firehose manageable. Jaeger's probabilistic sampler with a 5-10% rate catches most anomalies without exploding your budget. But here's the trick: combine it with rate-limiting per endpoint. Your health-check endpoint doesn't need tracing at all. Your payment service deserves higher sampling. I've seen a startup burn $2,000/month on Jaeger storage because they sampled 100% on a high-traffic API. Set sampler.type=probabilistic and sampler.param=0.1 in your OpenTelemetry config. For critical flows, inject a custom sampler that always traces on errors. Your SRE team will thank you.

sampling_config.pyPYTHON

from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.sampling import Sampler, Decision

class HealthCheckAwareSampler(Sampler):
    def should_sample(self, parent_context, trace_id, name, kind, attributes, links):
        # Skip tracing for health checks
        if name == "/health":
            return Decision.DROP
        # Sample 20% for everything else
        return Decision.RECORD_AND_SAMPLE if trace_id % 10 < 2 else Decision.DROP

trace.set_tracer_provider(
    TracerProvider(sampler=HealthCheckAwareSampler())
)

print("Custom sampler applied: health checks dropped, 20% sampling rate for others")

Output

Custom sampler applied: health checks dropped, 20% sampling rate for others

Pro Tip:

Jaeger supports tail-based sampling via the jaeger-sampling extension. For ultra-low latency apps, use head-based sampling and store only sampled spans. You can always re-analyze hot paths with manual instrumentation.

Key Takeaway

Sample 5-10% in production—save money, keep signal. Health checks get 0%.

● Production incidentPOST-MORTEMseverity: high

Missing Trace Context on Async Event Bus Caused False 'Healthy' Signals

Symptom

Traces for order processing showed only a single span (the HTTP handler) with no downstream spans from the Kafka consumer. The consumer's spans had different trace IDs, so they appeared as separate traces in Jaeger UI.

Assumption

Team assumed OpenTelemetry auto-instrumentation for Kafka would propagate context automatically. It did not — the Kafka integration requires manual setup for header injection.

Root cause

OpenTelemetry's Kafka instrumentation propagates headers only if you use the official producer/consumer API wrappers. The team was using a raw aiokafka library without an integration layer, so no traceparent header was passed.

Fix

Switch to the OpenTelemetry-instrumented Kafka producer/consumer, or manually inject/export context using opentelemetry.propagate.inject() when producing and extract() when consuming.

Key lesson

Auto-instrumentation is not magic — always verify context propagation at each boundary.
For any async or message-based communication, explicitly inject trace context into messages.
When testing, generate traces from end to end and check in Jaeger UI that a single trace spans all services.

Production debug guideSystematic checks to find why your distributed trace is broken5 entries

Symptom · 01

Trace visible in Jaeger UI but shows only a single span

→

Fix

Check context propagation; verify that the service making downstream calls injects the traceparent header. Test with curl -v and inspect request headers.

Symptom · 02

No traces for a specific service appear at all

→

Fix

Verify the service can reach the Jaeger Collector endpoint. Check service logs for OTLP export errors. Confirm the port (4317 for gRPC, 4318 for HTTP) matches collector configuration.

Symptom · 03

Traces appear but with spans out of order or negative duration

→

Fix

Run ntpq -p on all nodes to check clock synchronisation. Spans with timestamps from different hosts can be misordered if clocks drift more than 100ms.

Symptom · 04

Only a small fraction of traces appear despite high request volume

→

Fix

Check sampling configuration. Confirm you're not using a head-based sampler with rate too low for the traffic pattern. Look at Jaeger Collector metrics for 'sampling.dropped'.

Symptom · 05

Traces contain spans from service A but not service B, though B is called

→

Fix

B likely has a bug in its instrumentation or exporter. Test B in isolation: send a request that produces a trace and verify it appears. Common cause: missing OpenTelemetry package or wrong exporter endpoint.

★ Quick Trace Debug Cheat SheetFive common trace issues and the exact commands to diagnose them

No traces in Jaeger UI−

Immediate action

Ping the collector: curl http://jaeger-collector:4318

Commands

kubectl logs -l app=order-service --tail=20 | grep -i otlp

docker logs jaeger 2>&1 | grep -i error

Fix now

Restart the instrumented service after verifying endpoint env vars

Spans missing from a trace+

Distorted span timings (negative or huge values)+

Sampling rate too aggressive (traces missing)+

Context not propagated to downstream service+

Key takeaways

A trace = complete request journey across services. A span = one operation within a service.

OpenTelemetry is the vendor-neutral instrumentation API

use it to avoid lock-in.

FastAPIInstrumentor auto-instruments all routes

you only need manual spans for important sub-operations.

Trace context (trace ID, span ID) propagates via HTTP headers (traceparent) between services.

Use span attributes to add business context

order.id, user.id — makes filtering useful.

Sampling is a storage vs accuracy trade-off

always sample errors 100%, tune per-operation rates.

Clock skew breaks trace timelines

NTP synchronisation is mandatory in distributed systems.

Correlate trace IDs with logs and metrics for full observability

or you're still flying blind.

Common mistakes to avoid

5 patterns

Using auto-instrumentation only and assuming all spans are captured

Symptom

Critical latency inside a database call or cache lookup is invisible because no manual span wraps it. The trace shows the HTTP handler but not the expensive operation inside.

Fix

Add manual spans with tracer.start_as_current_span around every external I/O, lock acquisition, or business logic block that can take >10ms.

Running Jaeger all-in-one in production without persistent storage

Symptom

Traces disappear after container restart. Incident post-mortems have no traces because they were lost during the reboot.

Fix

Deploy Jaeger with a backend storage (Elasticsearch, Cassandra, or Kafka) configured via environment variables SPAN_STORAGE_TYPE=elasticsearch and proper connection endpoints.

Setting a global sampling rate without considering operation criticality

Symptom

Payment failures or latency spikes are rarely captured because the sampling rate is 1% and the incident happens in the 99% unsampled requests.

Fix

Use Jaeger's remote sampling configuration to set higher rates for critical endpoints (payment, auth) and lower rates for health checks and static content.

Not injecting trace context into asynchronous or batch job spans

Symptom

An API request kicks off a background job; the job's spans have a different trace ID, so you can't link the request to the job execution.

Fix

Pass the trace context via message headers (Kafka, RabbitMQ) or database column when enqueuing jobs. On the worker side, extract the context before starting the worker span.

Forgetting to handle clock skew across hosts

Symptom

Spans in the Jaeger UI appear with negative duration or overlapping incorrectly. Root cause analysis becomes unreliable.

Fix

Run NTP daemon on all servers. Monitor clock offset in your observability dashboards. Alert if offset exceeds 10ms.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR

Explain how distributed tracing works at the protocol level. How does a ...

Q02SENIOR

Compare head-based and tail-based sampling. When would you use each?

Q03SENIOR

What is the role of the OpenTelemetry Collector? How does it differ from...

Q04SENIOR

How would you debug a distributed trace that appears incomplete in Jaege...

Q01 of 04SENIOR

Explain how distributed tracing works at the protocol level. How does a span get linked to its parent across service boundaries?

ANSWER

Each span carries a trace ID, span ID, and parent span ID. When service A calls service B, OpenTelemetry injects a traceparent HTTP header with the current trace ID and span ID. Service B extracts that header and creates a new span with the same trace ID and the received span ID as parent. This creates a directed acyclic graph of spans. The header format is: 00-{trace_id}-{span_id}-{trace_flags} (W3C Trace Context).

FAQ · 5 QUESTIONS

Frequently Asked Questions

What is the difference between distributed tracing, logging, and metrics?

What is sampling in distributed tracing?

Can I use Jaeger without OpenTelemetry?

How do I persist Jaeger traces?

What is the overhead of enabling distributed tracing?

Naren Founder & Principal Engineer

20+ years shipping production infrastructure and CI/CD at scale. Lessons pulled from things that broke in production.

✓ Verified

production tested

June 10, 2026

last updated

1,554

articles · all by Naren

🔥

That's Monitoring. Mark it forged?

4 min read · try the examples if you haven't