Overview

tokenspy is your local LLM profiler. One line. Runs on your machine. Forever free.

💡

One-line install: pip install tokenspy — zero required dependencies.

The Problem

You get an OpenAI invoice — $800 this month. You have no idea which function caused it. Langfuse and Braintrust force you to reroute traffic through their cloud proxy, create accounts, and pay per seat.

tokenspy is your local alternative. It wraps your existing LLM calls, records every token, and shows you exactly where your costs are going — with a live dashboard, flame graph, and trace explorer. All on your machine. No accounts. No cloud.

What's in v0.2.0

🔥 Cost Flame Graphs

See exactly which function is burning your budget. Drill down into nested calls.

🔍 Structured Tracing

Full trace + span tree with inputs, outputs, token counts, and latency.

📊 Evaluations

Run LLM functions against golden test sets. Track pass/fail over time.

📝 Prompt Versioning

Every prompt version stored. Diff when costs spike. Roll back instantly.

📺 Live Dashboard

Web UI with cost charts, trace explorer, and token heatmaps. tokenspy serve

📡 OpenTelemetry

Export spans to Grafana, Jaeger, Datadog, or any OTLP collector.

Why tokenspy over Langfuse / Braintrust?

	tokenspy	Langfuse / Braintrust
Account required	None	Yes
Data leaves your machine	Never	Always
Setup time	30 seconds	15–30 min
Cost	Free forever	Paid tiers
Works offline	Yes	No
License	MIT	Various

Installation

Choose the install variant that matches your stack.

pip install tokenspy              # zero deps — core profiling
pip install tokenspy[openai]      # + OpenAI SDK
pip install tokenspy[anthropic]   # + Anthropic SDK
pip install tokenspy[langchain]   # + LangChain integration
pip install tokenspy[all]         # everything

ℹ️

tokenspy has zero required dependencies. It monkey-patches the LLM client you already have installed. Install only what you use.

Requirements

Python 3.10 or newer
Any LLM SDK: openai, anthropic, google-generativeai, or compatible

Quick Start

Two lines to get cost tracking. That's it.

import tokenspy

tokenspy.init()   # wraps your LLM SDK — no other changes needed

# Your existing code works unchanged
from openai import OpenAI
client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello!"}]
)

tokenspy.summary()
# ┌─────────────────┬────────┬──────────┬──────────┐
# │ Function        │ Tokens │   Cost   │ Calls    │
# ├─────────────────┼────────┼──────────┼──────────┤
# │ __main__        │  1,234 │ $0.00037 │       3  │
# └─────────────────┴────────┴──────────┴──────────┘

Decorator approach

@tokenspy.watch
def my_llm_chain(user_input):
    step1 = call_llm(f"Plan: {user_input}")
    step2 = call_llm(f"Execute: {step1}")
    return step2

result = my_llm_chain("write a report")
tokenspy.flame()   # → ASCII flame graph of cost by function

Open the dashboard

tokenspy serve   # opens localhost:7234

Cost Profiling

tokenspy's core feature: see exactly which function in your call tree is spending money.

import tokenspy

tokenspy.init()

@tokenspy.watch
def summarize(text):
    return call_llm(f"Summarize: {text}")

@tokenspy.watch
def analyze(docs):
    summaries = [summarize(d) for d in docs]
    return call_llm(f"Analyze: {summaries}")

analyze(my_documents)

# Print flame graph
tokenspy.flame()

# Or get structured data
report = tokenspy.report()
print(report.most_expensive_function)  # "analyze"
print(report.total_cost_usd)           # 0.0234

💡

The flame graph uses ASCII art — it works in any terminal, CI logs, and notebooks without extra dependencies.

Structured Tracing

Every LLM call is recorded as a span with full inputs, outputs, token counts, model, and latency.

with tokenspy.trace("my-pipeline", input=user_query) as t:
    result = run_pipeline(user_query)
    t.output = result
    t.metadata = {"user_id": "u123", "session": "abc"}

# View in dashboard or print
tokenspy.traces()    # last 50 traces
tokenspy.spans()     # all spans with parent-child tree

Nested spans

with tokenspy.trace("outer") as outer:
    with tokenspy.trace("inner-retrieval") as ret:
        docs = retrieve(query)
        ret.metadata = {"num_docs": len(docs)}

    with tokenspy.trace("inner-generation") as gen:
        answer = generate(query, docs)
        gen.output = answer

Evaluations

Run your LLM function against a dataset and score the results. Track pass rates over time.

ds = tokenspy.dataset("my-eval")
ds.add(input="What is 2+2?",    expected_output="4")
ds.add(input="Capital of France?", expected_output="Paris")

exp = tokenspy.experiment(
    "gpt-4o-mini-eval",
    dataset=ds,
    fn=my_chatbot,
    scorers=[
        tokenspy.scorers.exact_match,
        tokenspy.scorers.no_pii,
        tokenspy.scorers.latency_under(2.0),
    ],
    pass_threshold=0.9,
)

results = exp.run()
results.summary()
# pass_rate: 0.95  |  avg_cost: $0.0003  |  avg_latency: 0.8s

Prompt Versioning

Every prompt you register is versioned and stored locally. When costs spike, diff your prompts to find the cause.

prompt = tokenspy.prompt(
    name="summarizer",
    template="Summarize the following text in {n} words: {text}",
    version="1.0",
)

# Use it
result = call_llm(prompt.render(n=50, text=my_text))

# Update — old version is preserved
prompt.update(
    template="Write a {n}-word summary of: {text}",
    version="1.1",
    note="Shorter instruction reduces tokens by ~15%",
)

# Diff versions
tokenspy.prompts.diff("summarizer", "1.0", "1.1")

Live Dashboard

A local web UI showing cost charts, trace explorer, evaluation results, and token heatmaps. Zero configuration.

tokenspy serve          # opens at http://localhost:7234
tokenspy serve --port 8080

Dashboard tabs

Overview — total spend, top functions, cost trend
Traces — searchable trace list with span tree
Flame Graph — visual cost breakdown by call depth
Prompts — version history and diff viewer
Evaluations — experiment results over time
Costs — per-model, per-function breakdown

OpenTelemetry

Export tokenspy spans to any OTLP-compatible collector — Grafana, Jaeger, Datadog, Honeycomb, etc.

tokenspy.init(
    otel_endpoint="http://localhost:4317",   # OTLP gRPC
    otel_service_name="my-llm-app",
)

# All spans now export to your collector automatically

ℹ️

Spans follow the OpenTelemetry semantic conventions for LLMs. Compatible with the opentelemetry-sdk package.

LangChain Integration

Drop-in callback handler — no changes to your chain code.

from tokenspy.integrations.langchain import TokenSpyCallbackHandler

handler = TokenSpyCallbackHandler()

chain = (
    ChatPromptTemplate.from_template("{question}")
    | ChatOpenAI(model="gpt-4o-mini", callbacks=[handler])
    | StrOutputParser()
)

result = chain.invoke({"question": "What is LangChain?"})
tokenspy.summary()

GitHub Actions

Run cost regressions in CI. Fail the PR if costs spike vs. main.

# .github/workflows/cost-check.yml
name: LLM Cost Check
on: [pull_request]
jobs:
  cost:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: pip install tokenspy[openai]
      - run: tokenspy eval run evals/suite.py --git-compare origin/main --fail-on-regression
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}

💡

Every run is tagged with the git commit SHA. tokenspy writes a cost summary directly to $GITHUB_STEP_SUMMARY.

CLI Reference

# Core
tokenspy summary              # cost summary for recent runs
tokenspy flame                # ASCII flame graph
tokenspy traces               # list recent traces
tokenspy spans                # detailed span tree

# Dashboard
tokenspy serve                # open at :7234
tokenspy serve --port 8080

# Evaluations
tokenspy eval run myevals.py
tokenspy eval list
tokenspy eval show <name>

# Prompts
tokenspy prompts list
tokenspy prompts diff <name> 1.0 1.1

# Export
tokenspy export --format json --output traces.json
tokenspy export --format otel --endpoint http://localhost:4317

Pricing Table

tokenspy ships with a built-in pricing table for all major providers. Updated with each release.

Model	Input (per 1M tokens)	Output (per 1M tokens)
gpt-4o	$2.50	$10.00
gpt-4o-mini	$0.15	$0.60
claude-3-5-sonnet	$3.00	$15.00
claude-3-haiku	$0.25	$1.25
gemini-1.5-flash	$0.075	$0.30
gemini-1.5-pro	$1.25	$5.00

ℹ️

Override any price: tokenspy.pricing.set("my-model", input=0.001, output=0.002)

Supported Providers

Provider	SDK	Auto-detect
OpenAI	`openai`	Yes
Anthropic	`anthropic`	Yes
Google Gemini	`google-generativeai`	Yes
Azure OpenAI	`openai` (azure)	Yes
Ollama	`ollama`	Yes
LiteLLM	`litellm`	Yes
LangChain	callback handler	Manual