Skip to content
Home DevOps Distributed Tracing with Jaeger

Distributed Tracing with Jaeger

Where developers are forged. · Structured learning · Free forever.
📍 Part of: Monitoring → Topic 5 of 9
Distributed tracing with Jaeger — what distributed tracing is, spans and traces, OpenTelemetry instrumentation, deploying Jaeger, and reading traces to debug latency.
🔥 Advanced — solid DevOps foundation required
In this tutorial, you'll learn
Distributed tracing with Jaeger — what distributed tracing is, spans and traces, OpenTelemetry instrumentation, deploying Jaeger, and reading traces to debug latency.
  • A trace = complete request journey across services. A span = one operation within a service.
  • OpenTelemetry is the vendor-neutral instrumentation API — use it to avoid lock-in.
  • FastAPIInstrumentor auto-instruments all routes — you only need manual spans for important sub-operations.
✦ Plain-English analogy ✦ Real code with output ✦ Interview questions
Quick Answer

Distributed tracing tracks a request as it flows through multiple microservices. A trace is the complete journey; a span is one operation within the service. Jaeger is an open-source tracing backend that stores and visualises traces. Instrument your code with OpenTelemetry, configure it to export to Jaeger, and you can see exactly where latency comes from across service boundaries.

Instrumenting a FastAPI App with OpenTelemetry

Example · PYTHON
12345678910111213141516171819202122232425262728293031323334
# pip install opentelemetry-distro opentelemetry-exporter-otlp
# pip install opentelemetry-instrumentation-fastapi

from fastapi import FastAPI
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor

# Configure tracer
provider = TracerProvider()
exporter = OTLPSpanExporter(endpoint='http://jaeger:4317')  # Jaeger OTLP endpoint
provider.add_span_processor(BatchSpanProcessor(exporter))
trace.set_tracer_provider(provider)

app = FastAPI()
FastAPIInstrumentor.instrument_app(app)  # auto-instruments all routes

tracer = trace.get_tracer(__name__)

@app.get('/orders/{order_id}')
async def get_order(order_id: int):
    with tracer.start_as_current_span('fetch-order') as span:
        span.set_attribute('order.id', order_id)

        # Manual span for a specific operation
        with tracer.start_as_current_span('db-query'):
            order = await db.get_order(order_id)

        with tracer.start_as_current_span('enrich-order'):
            user = await user_service.get_user(order.user_id)  # cross-service call

        return {'order': order, 'user': user}
▶ Output
# Traces exported to Jaeger — visible in Jaeger UI at http://jaeger:16686

Running Jaeger with Docker

Example · BASH
1234567891011121314151617
# Run Jaeger all-in-one (development setup)
docker run -d \
  --name jaeger \
  -p 16686:16686 \
  -p 4317:4317 \
  -p 4318:4318 \
  jaegertracing/all-in-one:latest

# Ports:
# 16686Jaeger UI
# 4317OTLP gRPC receiver
# 4318OTLP HTTP receiver

# Open Jaeger UI: http://localhost:16686
# Search by service name → see all traces
# Click a trace → see full span timeline
# Click a span → see attributes, events, errors
▶ Output
# Jaeger UI at http://localhost:16686

🎯 Key Takeaways

  • A trace = complete request journey across services. A span = one operation within a service.
  • OpenTelemetry is the vendor-neutral instrumentation API — use it to avoid lock-in.
  • FastAPIInstrumentor auto-instruments all routes — you only need manual spans for important sub-operations.
  • Trace context (trace ID, span ID) propagates via HTTP headers (traceparent) between services.
  • Use span attributes to add business context: order.id, user.id — makes filtering useful.

Interview Questions on This Topic

  • QWhat is distributed tracing and when would you use it?
  • QWhat is the difference between a trace and a span?
  • QWhat is sampling in tracing and why is it needed?

Frequently Asked Questions

What is the difference between distributed tracing, logging, and metrics?

Logs are time-stamped text events from a single service. Metrics are aggregated numerical measurements (request rate, error rate, latency percentiles). Distributed traces show the causal chain of events across services for a single request. Observability requires all three: metrics to know something is wrong, logs to see what happened, traces to find where.

What is sampling in distributed tracing?

Recording every trace at high traffic volumes is expensive. Sampling records only a fraction of traces — head-based sampling decides at the start of a request (simple, misses tail latency). Tail-based sampling decides after the trace completes, keeping slow or error traces — more accurate but requires buffering. Jaeger supports both. Common approach: sample 1-5% of normal traces, always sample errors.

🔥
Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

← PreviousApplication Performance MonitoringNext →SLI SLO SLA Explained
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged