How to configure head sampling in OpenTelemetry

OpenTelemetry head sampling decides whether a trace should be recorded while an application creates the root span. Setting the sampler in the SDK reduces exported span volume before telemetry reaches a Collector or backend, which helps high-volume services keep trace cost and storage under control.

The standard SDK environment variables are OTEL_TRACES_SAMPLER and OTEL_TRACES_SAMPLER_ARG. Use parentbased_traceidratio when a service should honor an incoming parent sampling decision and apply a probability only when it starts a new trace.

Head sampling cannot recover traces later because an error, slow span, or important attribute appeared downstream; unsampled spans were never exported. Use SDK head sampling for edge-volume limits, and use Collector tail sampling when the decision must inspect the complete trace after spans arrive.

Steps to configure OpenTelemetry head sampling:

  1. Choose the sampler and ratio for new root traces.

    parentbased_traceidratio applies TraceIdRatioBased only to root spans. Child spans follow the sampled flag from their parent, which keeps a distributed trace from being partly exported by one service and dropped by the next.

  2. Set the sampler in the service startup environment.
    $ export OTEL_TRACES_SAMPLER="parentbased_traceidratio"
  3. Set the sampling ratio argument.
    $ export OTEL_TRACES_SAMPLER_ARG="0.25"

    OTEL_TRACES_SAMPLER_ARG for traceidratio and parentbased_traceidratio is a probability from 0.0 to 1.0. A value of 0.25 samples roughly one quarter of new root traces, but short test runs will not produce exactly 25 percent every time.

  4. Start or restart the instrumented service with the variables in the same environment as the SDK.

    For systemd, containers, and Kubernetes workloads, put the variables in the unit, container spec, deployment manifest, or runtime injection settings before the process starts. Code-created tracer providers that do not read SDK environment variables need the equivalent sampler set in code.

  5. Create a temporary Python SDK sampler check in an environment that has opentelemetry-sdk installed.
    sampling-check.py
    import os
     
    from opentelemetry import trace
    from opentelemetry.sdk.trace import TracerProvider
    from opentelemetry.sdk.trace.export import SimpleSpanProcessor, SpanExporter, SpanExportResult
     
     
    class CountingExporter(SpanExporter):
        def __init__(self):
            self.names = []
     
        def export(self, spans):
            self.names.extend(span.name for span in spans)
            return SpanExportResult.SUCCESS
     
     
    exporter = CountingExporter()
    provider = TracerProvider()
    provider.add_span_processor(SimpleSpanProcessor(exporter))
    trace.set_tracer_provider(provider)
     
    tracer = trace.get_tracer("sampling-check")
    for index in range(40):
        with tracer.start_as_current_span(f"checkout-{index:02d}"):
            with tracer.start_as_current_span(f"checkout-db-{index:02d}"):
                pass
     
    provider.force_flush()
     
    root_spans = [name for name in exporter.names if not name.startswith("checkout-db-")]
    child_spans = [name for name in exporter.names if name.startswith("checkout-db-")]
     
    print(f"sampler={os.getenv('OTEL_TRACES_SAMPLER', 'unset')}")
    print(f"sampler_arg={os.getenv('OTEL_TRACES_SAMPLER_ARG', 'unset')}")
    print("root_spans_started=40")
    print(f"exported_root_spans={len(root_spans)}")
    print(f"exported_child_spans={len(child_spans)}")
    print("first_exported_roots=" + ",".join(root_spans[:8]))

    The CountingExporter receives only sampled spans. Unsampled root spans and their child spans are not passed to the span processor or exporter.

  6. Run the sampler check from the same shell environment.
    $ python3 sampling-check.py
    sampler=parentbased_traceidratio
    sampler_arg=0.25
    root_spans_started=40
    exported_root_spans=6
    exported_child_spans=6
    first_exported_roots=checkout-04,checkout-05,checkout-27,checkout-29,checkout-35,checkout-37

    The exact count changes between short runs because TraceIdRatioBased is probabilistic. The child count matching the exported root count confirms that parent-based sampling kept each sampled trace together.

  7. Apply the same variables where the real service starts.

    Keep existing exporter and resource variables such as OTEL_EXPORTER_OTLP_ENDPOINT and OTEL_SERVICE_NAME in the same startup surface. The sampler only changes which spans are recorded and exported; it does not identify the service or choose the destination.

  8. Verify the trace volume in the Collector debug output or backend search after the service restarts.
    Resource service.name=checkout-api
    Span #0
    Name           : checkout-04
    Trace ID       : 4fd0c9fb3a4b0c12c49bb1df32b1d7e8
    Flags          : Sampled
    ##### snipped #####

    For a 0.25 root ratio, expect fewer traces than request count over time rather than an exact per-minute percentage. If every request still appears, confirm the process started after the sampler variables were set and that application code did not override the tracer provider sampler.

  9. Remove the temporary sampler check file.
    $ rm sampling-check.py