Cost Tracking¶

TraceVerde includes automatic cost tracking with pricing data for 1,050+ models across 30+ providers. Every LLM call is enriched with per-request cost breakdown.

How It Works¶

Cost tracking is enabled by default. For every LLM call, TraceVerde:

Reads the model name from the span attributes
Looks up pricing in the built-in llm_pricing.json database
Calculates cost from token usage (prompt + completion)
Adds cost attributes to the span

No configuration needed - just instrument and go.

Cost Attributes¶

Every LLM span gets these attributes:

Attribute	Description	Example
`gen_ai.usage.cost.total`	Total cost in USD	`0.003250`
`gen_ai.usage.cost.prompt`	Prompt token cost	`0.001250`
`gen_ai.usage.cost.completion`	Completion token cost	`0.002000`

Supported Providers¶

Provider	Models	Pricing Type
OpenAI	GPT-4o, GPT-4 Turbo, GPT-5.2, o1/o3, embeddings (50+)	Per token (prompt/completion)
Anthropic	Claude Sonnet 4.6, Claude 3.5/3 series (15+)	Per token + cache pricing
Google AI	Gemini 2.5/2.0 Pro/Flash, PaLM 2 (30+)	Per token
AWS Bedrock	Titan, Claude, Llama, Mistral (25+)	Per token
Azure OpenAI	Same as OpenAI	Per token
Cohere	Command R/R+, Embed v4/v3, rerankers (15+)	Per token
Mistral AI	Large/Medium/Small, Mixtral, embeddings (20+)	Per token
Together AI	DeepSeek-R1, Llama 3.x, Qwen (25+)	Per token
Groq	Llama 3.x, Mixtral, Gemma (20+)	Per token
Ollama	All local models	Token tracking (free)
Vertex AI	Gemini models	Per token
Replicate	All models	Per second (hardware-based)
HuggingFace	Local models	Estimated (parameter-based)
Sarvam AI	sarvam-m, Saarika, Bulbul (12+)	Per token
Voyage AI	voyage-4/3.5/3 series (15+)	Per token
Jina AI	jina-embeddings-v3, jina-clip-v2 (5+)	Per token
Deepgram	Nova-3/2, Aura, Whisper (20+)	Per second/character
AssemblyAI	Universal-3, slam-1 (5+)	Per second
ElevenLabs	Multilingual v2, Turbo v2 (8+)	Per character
IBM Granite	Chat, vision, embeddings (10+)	Per token
DeepSeek	V3, R1, VL (15+)	Per token
Qwen/Alibaba	Qwen 3.5, VL, embeddings (25+)	Per token
xAI	Grok 4.20, Grok 4.1 (5+)	Per token

Special Pricing¶

Reasoning tokens: OpenAI o1/o3 series have separate pricing for reasoning tokens
Cache pricing: Anthropic prompt caching costs (read/write rates)
Batch pricing: Some providers offer discounted batch pricing
Hardware pricing: Replicate charges per second of GPU/CPU time

Custom Model Pricing¶

For models not in the pricing database:

# Chat models
export GENAI_CUSTOM_PRICING_JSON='{"chat":{"my-model":{"promptPrice":0.001,"completionPrice":0.002}}}'

# Embeddings
export GENAI_CUSTOM_PRICING_JSON='{"embeddings":{"my-embed":0.00005}}'

# Multiple categories
export GENAI_CUSTOM_PRICING_JSON='{
  "chat": {
    "my-custom-chat": {"promptPrice": 0.001, "completionPrice": 0.002}
  },
  "embeddings": {
    "my-custom-embed": 0.00005
  }
}'

Custom prices merge with defaults. If you provide pricing for an existing model, the custom price overrides the default.

Pricing format:

Chat models: {"promptPrice": <$/1k tokens>, "completionPrice": <$/1k tokens>}
Embeddings: Single number for price per 1k tokens
Audio: Price per 1k characters (TTS) or per second (STT)

OpenInference Cost Enrichment¶

When using OpenInference instrumentors (LiteLLM, Smolagents, MCP), cost tracking is automatically applied via CostEnrichmentSpanProcessor. It reads OpenInference semantic conventions and adds cost attributes:

llm.model_name -> model lookup
llm.token_count.prompt / llm.token_count.completion -> cost calculation
openinference.span.kind -> call type (LLM, EMBEDDING, etc.)

Disable Cost Tracking¶

export GENAI_ENABLE_COST_TRACKING=false

Or programmatically:

genai_otel.instrument(enable_cost_tracking=False)

Grafana Dashboard¶

Import the pre-built GenAI overview dashboard to visualize costs over time by provider and model.