LLM Providers¶
TraceVerde auto-instruments 19+ LLM providers. No code changes are needed - just install the provider SDK and TraceVerde handles the rest.
Providers with Full Cost Tracking¶
| Provider | Models | Install Extra | Example |
|---|---|---|---|
| OpenAI | GPT-4o, GPT-4 Turbo, GPT-5.2, o1/o3, embeddings (50+) | [openai] |
example |
| OpenRouter | All models via OpenAI-compatible API | [openrouter] |
example |
| Anthropic | Claude Sonnet 4.6, Claude 3.5/3 series (15+) | [anthropic] |
example |
| Google AI | Gemini 2.5/2.0 Pro/Flash, PaLM 2 (30+) | [google] |
example |
| AWS Bedrock | Amazon Titan, Claude, Llama, Mistral (25+) | [aws] |
example |
| Azure OpenAI | Same as OpenAI with Azure pricing | [openai] |
example |
| Cohere | Command R/R+, Embed v4/v3, rerankers (15+) | [cohere] |
example |
| Mistral AI | Large/Medium/Small, Mixtral, embeddings (20+) | [mistral] |
example |
| Together AI | DeepSeek-R1, Llama 3.x, Qwen (25+) | [together] |
example |
| Groq | Llama 3.x, Mixtral, Gemma, Whisper (20+) | [groq] |
example |
| Ollama | All local models with token tracking | [ollama] |
example |
| Vertex AI | Gemini models via Google Cloud | [vertexai] |
example |
| SambaNova | sarvam-m, Saarika, Bulbul (12+) | [sambanova] |
example |
| Sarvam AI | Indian language models | [sarvamai] |
example |
| Replicate | Hardware-based pricing ($/second) | [replicate] |
example |
Quick Example: OpenAI¶
import genai_otel
genai_otel.instrument()
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is OpenTelemetry?"},
],
max_tokens=150,
)
print(f"Response: {response.choices[0].message.content}")
print(f"Tokens used: {response.usage.total_tokens}")
# Traces, metrics, and costs are automatically captured
Quick Example: Anthropic¶
import genai_otel
genai_otel.instrument()
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{"role": "user", "content": "Explain quantum computing in one sentence."}
],
)
print(message.content[0].text)
# Cost tracking and token usage automatically captured
Quick Example: Ollama (Local)¶
import genai_otel
genai_otel.instrument()
import ollama
response = ollama.chat(
model="llama2",
messages=[{"role": "user", "content": "Why is the sky blue?"}],
)
print(response["message"]["content"])
# Local model traces captured with token counting
Special Providers¶
HuggingFace Transformers¶
Local model execution with estimated costs based on parameter count.
Instruments:
pipeline()AutoModelForCausalLM.generate()AutoModelForSeq2SeqLM.generate()InferenceClientAPI calls
See examples:
- Basic HuggingFace
- AutoModel
- With PII detection
- With toxicity detection
- With bias detection
- Multiple evaluations
Hyperbolic¶
Requires OTLP gRPC exporter due to requests library conflicts.
export OTEL_EXPORTER_OTLP_PROTOCOL=grpc
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
export GENAI_ENABLED_INSTRUMENTORS="openai,anthropic,hyperbolic"
See Hyperbolic example.
Google GenAI (new SDK)¶
See Google GenAI example.
LiteLLM (Multi-Provider Proxy)¶
LiteLLM enables cost tracking across 100+ providers via a single proxy. See LiteLLM example.
Smolagents (HuggingFace Agents)¶
See Smolagents example.
Captured Attributes¶
For every LLM call:
| Attribute | Description |
|---|---|
gen_ai.system |
Provider name (e.g., "openai") |
gen_ai.request.model |
Requested model |
gen_ai.response.model |
Actual model used |
gen_ai.request.type |
Call type (chat, embedding) |
gen_ai.usage.prompt_tokens |
Input token count |
gen_ai.usage.completion_tokens |
Output token count |
gen_ai.usage.total_tokens |
Total tokens |
gen_ai.cost.amount |
Estimated cost in USD |
All Examples¶
Browse all provider examples in the examples/ directory.