Performance Tuning Guide¶
This guide covers configuration options for optimizing genai-otel-instrument in production.
Import Time¶
The library uses lazy imports. import genai_otel loads in ~20ms. Heavy modules (OTel SDK, instrumentors, GPU metrics) are only loaded when you call genai_otel.instrument() or access specific classes.
Sampling Rate¶
Reduce telemetry volume by sampling a fraction of traces:
A sampling rate of 1.0 (default) traces everything. Set to 0.0 to disable tracing entirely.
Batch Export Configuration¶
The library uses BatchSpanProcessor and PeriodicExportingMetricReader for non-blocking export. OpenTelemetry environment variables control batching:
# Spans: max batch size and export interval
export OTEL_BSP_MAX_EXPORT_BATCH_SIZE=512 # default: 512
export OTEL_BSP_SCHEDULE_DELAY_MILLIS=5000 # default: 5000ms
export OTEL_BSP_MAX_QUEUE_SIZE=2048 # default: 2048
# Metrics: export interval
export OTEL_METRIC_EXPORT_INTERVAL=60000 # default: 60000ms (1 min)
For high-throughput applications, increase MAX_QUEUE_SIZE and MAX_EXPORT_BATCH_SIZE.
Content Capture¶
Capturing prompt/response content adds overhead to span size and export bandwidth:
# Disable content capture for lower overhead
export GENAI_ENABLE_CONTENT_CAPTURE=false
# Or limit content length (default: 1000 chars)
export GENAI_CONTENT_MAX_LENGTH=200
GPU Metrics¶
GPU metrics collection runs in a background thread. Disable if not needed:
Or adjust collection interval:
Cost Tracking¶
Cost calculation adds minimal overhead but can be disabled:
MCP Instrumentation¶
Disable database/cache/API instrumentation if not needed:
Selective Instrumentors¶
Only enable the instrumentors you need:
OTLP Protocol¶
HTTP (default) works for most setups. gRPC may offer better performance for high-throughput:
OTLP Timeout¶
Increase timeout for slow networks or reduce for faster failure:
Production Checklist¶
- Set
GENAI_SAMPLING_RATEto 0.1-0.5 for high-traffic services - Disable
GENAI_ENABLE_CONTENT_CAPTUREunless needed for debugging - Set
GENAI_ENABLE_GPU_METRICS=falseif no GPUs are used - Use
GENAI_ENABLED_INSTRUMENTORSto limit to providers you actually use - Set
GENAI_FAIL_ON_ERROR=false(default) so instrumentation issues don't crash your app - Configure
OTEL_BSP_MAX_QUEUE_SIZEbased on your request volume