Overview
tokenspy is your local LLM profiler. One line. Runs on your machine. Forever free.
💡
One-line install: pip install tokenspy — zero required dependencies.
The Problem
You get an OpenAI invoice — $800 this month. You have no idea which function caused it. Langfuse and Braintrust force you to reroute traffic through their cloud proxy, create accounts, and pay per seat.
tokenspy is your local alternative. It wraps your existing LLM calls, records every token, and shows you exactly where your costs are going — with a live dashboard, flame graph, and trace explorer. All on your machine. No accounts. No cloud.
What's in v0.2.0
🔥 Cost Flame Graphs
See exactly which function is burning your budget. Drill down into nested calls.
🔍 Structured Tracing
Full trace + span tree with inputs, outputs, token counts, and latency.
📊 Evaluations
Run LLM functions against golden test sets. Track pass/fail over time.
📝 Prompt Versioning
Every prompt version stored. Diff when costs spike. Roll back instantly.
📺 Live Dashboard
Web UI with cost charts, trace explorer, and token heatmaps. tokenspy serve
📡 OpenTelemetry
Export spans to Grafana, Jaeger, Datadog, or any OTLP collector.
Why tokenspy over Langfuse / Braintrust?
| tokenspy | Langfuse / Braintrust |
| Account required | None | Yes |
| Data leaves your machine | Never | Always |
| Setup time | 30 seconds | 15–30 min |
| Cost | Free forever | Paid tiers |
| Works offline | Yes | No |
| License | MIT | Various |
Installation
Choose the install variant that matches your stack.
pip install tokenspy # zero deps — core profiling
pip install tokenspy[openai] # + OpenAI SDK
pip install tokenspy[anthropic] # + Anthropic SDK
pip install tokenspy[langchain] # + LangChain integration
pip install tokenspy[all] # everything
ℹ️
tokenspy has zero required dependencies. It monkey-patches the LLM client you already have installed. Install only what you use.
Requirements
- Python 3.10 or newer
- Any LLM SDK:
openai, anthropic, google-generativeai, or compatible
Quick Start
Two lines to get cost tracking. That's it.
import tokenspy
tokenspy.init() # wraps your LLM SDK — no other changes needed
# Your existing code works unchanged
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello!"}]
)
tokenspy.summary()
# ┌─────────────────┬────────┬──────────┬──────────┐
# │ Function │ Tokens │ Cost │ Calls │
# ├─────────────────┼────────┼──────────┼──────────┤
# │ __main__ │ 1,234 │ $0.00037 │ 3 │
# └─────────────────┴────────┴──────────┴──────────┘
Decorator approach
@tokenspy.watch
def my_llm_chain(user_input):
step1 = call_llm(f"Plan: {user_input}")
step2 = call_llm(f"Execute: {step1}")
return step2
result = my_llm_chain("write a report")
tokenspy.flame() # → ASCII flame graph of cost by function
Open the dashboard
tokenspy serve # opens localhost:7234
Cost Profiling
tokenspy's core feature: see exactly which function in your call tree is spending money.
import tokenspy
tokenspy.init()
@tokenspy.watch
def summarize(text):
return call_llm(f"Summarize: {text}")
@tokenspy.watch
def analyze(docs):
summaries = [summarize(d) for d in docs]
return call_llm(f"Analyze: {summaries}")
analyze(my_documents)
# Print flame graph
tokenspy.flame()
# Or get structured data
report = tokenspy.report()
print(report.most_expensive_function) # "analyze"
print(report.total_cost_usd) # 0.0234
💡
The flame graph uses ASCII art — it works in any terminal, CI logs, and notebooks without extra dependencies.
Structured Tracing
Every LLM call is recorded as a span with full inputs, outputs, token counts, model, and latency.
with tokenspy.trace("my-pipeline", input=user_query) as t:
result = run_pipeline(user_query)
t.output = result
t.metadata = {"user_id": "u123", "session": "abc"}
# View in dashboard or print
tokenspy.traces() # last 50 traces
tokenspy.spans() # all spans with parent-child tree
Nested spans
with tokenspy.trace("outer") as outer:
with tokenspy.trace("inner-retrieval") as ret:
docs = retrieve(query)
ret.metadata = {"num_docs": len(docs)}
with tokenspy.trace("inner-generation") as gen:
answer = generate(query, docs)
gen.output = answer
Evaluations
Run your LLM function against a dataset and score the results. Track pass rates over time.
ds = tokenspy.dataset("my-eval")
ds.add(input="What is 2+2?", expected_output="4")
ds.add(input="Capital of France?", expected_output="Paris")
exp = tokenspy.experiment(
"gpt-4o-mini-eval",
dataset=ds,
fn=my_chatbot,
scorers=[
tokenspy.scorers.exact_match,
tokenspy.scorers.no_pii,
tokenspy.scorers.latency_under(2.0),
],
pass_threshold=0.9,
)
results = exp.run()
results.summary()
# pass_rate: 0.95 | avg_cost: $0.0003 | avg_latency: 0.8s
Prompt Versioning
Every prompt you register is versioned and stored locally. When costs spike, diff your prompts to find the cause.
prompt = tokenspy.prompt(
name="summarizer",
template="Summarize the following text in {n} words: {text}",
version="1.0",
)
# Use it
result = call_llm(prompt.render(n=50, text=my_text))
# Update — old version is preserved
prompt.update(
template="Write a {n}-word summary of: {text}",
version="1.1",
note="Shorter instruction reduces tokens by ~15%",
)
# Diff versions
tokenspy.prompts.diff("summarizer", "1.0", "1.1")
Live Dashboard
A local web UI showing cost charts, trace explorer, evaluation results, and token heatmaps. Zero configuration.
tokenspy serve # opens at http://localhost:7234
tokenspy serve --port 8080
Dashboard tabs
- Overview — total spend, top functions, cost trend
- Traces — searchable trace list with span tree
- Flame Graph — visual cost breakdown by call depth
- Prompts — version history and diff viewer
- Evaluations — experiment results over time
- Costs — per-model, per-function breakdown
OpenTelemetry
Export tokenspy spans to any OTLP-compatible collector — Grafana, Jaeger, Datadog, Honeycomb, etc.
tokenspy.init(
otel_endpoint="http://localhost:4317", # OTLP gRPC
otel_service_name="my-llm-app",
)
# All spans now export to your collector automatically
ℹ️
Spans follow the OpenTelemetry semantic conventions for LLMs. Compatible with the opentelemetry-sdk package.
LangChain Integration
Drop-in callback handler — no changes to your chain code.
from tokenspy.integrations.langchain import TokenSpyCallbackHandler
handler = TokenSpyCallbackHandler()
chain = (
ChatPromptTemplate.from_template("{question}")
| ChatOpenAI(model="gpt-4o-mini", callbacks=[handler])
| StrOutputParser()
)
result = chain.invoke({"question": "What is LangChain?"})
tokenspy.summary()
GitHub Actions
Run cost regressions in CI. Fail the PR if costs spike vs. main.
# .github/workflows/cost-check.yml
name: LLM Cost Check
on: [pull_request]
jobs:
cost:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: pip install tokenspy[openai]
- run: tokenspy eval run evals/suite.py --git-compare origin/main --fail-on-regression
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
💡
Every run is tagged with the git commit SHA. tokenspy writes a cost summary directly to $GITHUB_STEP_SUMMARY.
CLI Reference
# Core
tokenspy summary # cost summary for recent runs
tokenspy flame # ASCII flame graph
tokenspy traces # list recent traces
tokenspy spans # detailed span tree
# Dashboard
tokenspy serve # open at :7234
tokenspy serve --port 8080
# Evaluations
tokenspy eval run myevals.py
tokenspy eval list
tokenspy eval show <name>
# Prompts
tokenspy prompts list
tokenspy prompts diff <name> 1.0 1.1
# Export
tokenspy export --format json --output traces.json
tokenspy export --format otel --endpoint http://localhost:4317
Pricing Table
tokenspy ships with a built-in pricing table for all major providers. Updated with each release.
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
| gpt-4o | $2.50 | $10.00 |
| gpt-4o-mini | $0.15 | $0.60 |
| claude-3-5-sonnet | $3.00 | $15.00 |
| claude-3-haiku | $0.25 | $1.25 |
| gemini-1.5-flash | $0.075 | $0.30 |
| gemini-1.5-pro | $1.25 | $5.00 |
ℹ️
Override any price: tokenspy.pricing.set("my-model", input=0.001, output=0.002)
Supported Providers
| Provider | SDK | Auto-detect |
| OpenAI | openai | Yes |
| Anthropic | anthropic | Yes |
| Google Gemini | google-generativeai | Yes |
| Azure OpenAI | openai (azure) | Yes |
| Ollama | ollama | Yes |
| LiteLLM | litellm | Yes |
| LangChain | callback handler | Manual |