tokenspy 🔥
cProfile for LLMs — find which function is burning your AI budget. Flame graphs, structured tracing, evaluations, prompt versioning, and a live dashboard.
pip install tokenspy — zero dependencies.The Problem
You get an OpenAI invoice — $800 this month. You have no idea which function caused it. Langfuse and Braintrust force you to reroute traffic through their cloud proxy.
tokenspy is your local alternative. One line. Runs on your machine. Forever free.
What's in v0.2.0
Installation
Choose the install variant that matches your stack.
pip install tokenspy # zero deps
pip install tokenspy[openai]
pip install tokenspy[anthropic]
pip install tokenspy[langchain]
pip install tokenspy[otel] # OpenTelemetry
pip install tokenspy[server] # Live dashboard
pip install tokenspy[all]
Quick Start
Get cost visibility in under 60 seconds.
Minimal — 1 Line
import tokenspy
@tokenspy.profile
def my_function():
return openai.chat.completions.create(model="gpt-4o", messages=[...])
my_function()
tokenspy.report()
Full Setup
import tokenspy
tokenspy.init(persist=True, track_git=True, otel_endpoint="http://localhost:4317")
with tokenspy.trace("my_pipeline", input={"query": q}) as t:
with tokenspy.span("retrieve") as s:
docs = fetch(q)
s.update(output=docs)
with tokenspy.span("generate", span_type="llm") as s:
answer = llm_call(docs)
t.update(output=answer)
t.score("quality", 0.9)
tokenspy.report()
tokenspy serve to open the dashboard at localhost:7234.Cost Profiling
See which function is burning your AI budget.
@tokenspy.profile
def run_pipeline(query):
docs = fetch_and_summarize(query)
entities = extract_entities(docs)
return generate_report(entities)
run_pipeline("Analyze Q3 earnings")
tokenspy.report()
Budget Alerts
@tokenspy.profile(budget_usd=0.10)
def my_agent(query): ...
@tokenspy.profile(budget_usd=0.10, on_exceeded="raise")
def strict_agent(query): ...
Structured Tracing
See exactly what happens inside every LLM call.
LLM calls made inside a span are automatically linked — no manual wiring.
with tokenspy.trace("research", input={"query": "climate change"}) as t:
with tokenspy.span("retrieve", span_type="retrieval") as s:
docs = vector_store.search("climate change", top_k=5)
s.update(output={"n_docs": len(docs)})
with tokenspy.span("summarize", span_type="llm") as s:
response = openai.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": f"Summarize: {docs}"}]
)
t.score("relevance", 0.92, scorer="human")
Evaluations & Datasets
Run LLM functions against golden test sets and track quality.
ds = tokenspy.dataset("qa-golden")
ds.add(input={"question": "Capital of France?"}, expected_output="Paris")
exp = tokenspy.experiment(
"gpt4o-mini-baseline",
dataset="qa-golden",
fn=answer_question,
scorers=[scorers.exact_match, scorers.contains],
)
results = exp.run()
results.summary()
Prompt Versioning
Track every version. Know which caused a cost spike.
p = tokenspy.prompts.push("summarizer",
"Summarize in {{style}} style, max {{max_words}} words:\n\n{{text}}")
compiled = p.compile(style="concise", max_words=100, text="...")
tokenspy.prompts.set_production("summarizer", version=2)
Live Dashboard
Web UI for costs, traces, evaluations, and prompts.
tokenspy serve # localhost:7234
tokenspy serve --port 8080 --db custom.db
5 tabs: Overview, Traces, Evaluations, Prompts, Settings.
pip install tokenspy[server]
OpenTelemetry Export
Send data to Grafana, Jaeger, Datadog, or any OTEL backend.
tokenspy.init(persist=True, otel_endpoint="http://localhost:4317", otel_service_name="my-app")
pip install tokenspy[otel]
LangChain Integration
Use tokenspy with LangChain and LangGraph.
from tokenspy.integrations.langchain import TokenspyCallbackHandler
chain.invoke(prompt, config={"callbacks": [TokenspyCallbackHandler()]})
GitHub Actions — Cost Diff
Catch cost regressions in CI before they ship.
from tokenspy.ci import annotate_cost_diff
annotate_cost_diff("current_run.db", "baseline.db")
CLI Reference
tokenspy history [--limit 50]
tokenspy report [--format html]
tokenspy compare --commit abc123 --commit def456
tokenspy serve [--port 8080] [--no-open]
Built-in Pricing Table
30+ models, updated March 2026. No network calls.
| Model | Input $/1M | Output $/1M |
|---|---|---|
claude-opus-4-6 | $15.00 | $75.00 |
claude-sonnet-4-6 | $3.00 | $15.00 |
gpt-4o | $2.50 | $10.00 |
gpt-4o-mini | $0.15 | $0.60 |
gemini-1.5-pro | $1.25 | $5.00 |
Supported Providers
| Provider | Package | Auto-detected |
|---|---|---|
| OpenAI | openai>=1.0 | chat.completions.create |
| Anthropic | anthropic>=0.30 | messages.create |
google-generativeai>=0.7 | generate_content | |
| LangChain | langchain-core>=0.2 | Callback handler |
agent-memory 🧠
Production-ready persistent memory for AI agents. Works with LangChain, CrewAI, AutoGen, and raw SDKs — in 3 lines.
pip install agentcortexThe Problem
Every time your agent starts a new session, it starts from zero. This isn't an AI limitation — it's a missing infrastructure layer.
Features
Installation
pip install agentcortex # minimal
pip install "agentcortex[chromadb,local]" # semantic search (recommended)
pip install "agentcortex[all]" # everything
pip install "agentcortex[mcp]" # MCP server
pip install "agentcortex[qdrant]" # production backend
pip install "agentcortex[autogen]" # AutoGen adapter
Quick Start
Add persistent memory in 3 lines.
from agentmemory import MemoryStore
memory = MemoryStore(agent_id="my-agent")
memory.remember("User's name is Alice, building fraud detection")
context = memory.get_context("What do we know about the user?")
Memory persists to disk. It's there next session, and the one after that.
Memory Architecture
Three-tier system mirroring human memory.
Working Memory — current conversation. Auto-compresses when nearing the token limit.
Episodic Memory — recent interactions in SQLite. Evicts low-importance entries.
Semantic Memory — long-term facts as vector embeddings (ChromaDB). Retrieved by meaning.
┌──────────────────────────────────────────┐
│ MemoryStore │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Working │ │ Episodic │ │ Semantic │ │
│ │ (RAM) │ │ (SQLite) │ │ (Chroma) │ │
│ └──────────┘ └──────────┘ └──────────┘ │
└──────────────────────────────────────────┘
API Reference
MemoryStore(
agent_id: str,
persist_dir: str = "~/.agentmemory",
max_working_tokens: int = 4096,
semantic_backend: str = "chromadb",
embedding_provider: str = "sentence-transformers",
llm_provider: str = "anthropic",
enable_dedup: bool = True,
auto_compress: bool = True,
)
| Method | Description |
|---|---|
remember(content, importance=5) | Store a fact in episodic + semantic |
recall(query, n=5) | Top-n relevant memories by meaning |
get_context(query, max_tokens=500) | Formatted context for system prompt |
add_message(role, content) | Track conversation turn |
get_messages() | Working memory as [{role, content}] |
compress() | Manual compression trigger |
stats() | Usage across all tiers |
clear(tiers=None) | Clear specific or all tiers |
Async Support
from agentmemory import AsyncMemoryStore
async with AsyncMemoryStore(agent_id="my-agent") as memory:
await memory.remember("User prefers Python", importance=7)
results = await memory.recall("tech stack")
context = await memory.get_context("What do we know?")
Anthropic Integration
from agentmemory import MemoryStore
import anthropic
memory = MemoryStore(agent_id="my-agent")
client = anthropic.Anthropic()
def chat(user_input: str) -> str:
memory.add_message("user", user_input)
response = client.messages.create(
model="claude-sonnet-4-6", max_tokens=1024,
system=f"You are helpful.\n\n{memory.get_context(user_input)}",
messages=memory.get_messages(),
)
reply = response.content[0].text
memory.add_message("assistant", reply)
return reply
OpenAI Integration
from agentmemory.adapters.openai import MemoryOpenAI
client = MemoryOpenAI(agent_id="my-agent")
client.chat("Hi, I'm Alice")
# Next session...
client.chat("What's my name?") # → "Your name is Alice." ✅
LangChain Integration
from agentmemory import MemoryStore
from agentmemory.adapters.langchain import MemoryHistory, inject_memory_context
from langchain_anthropic import ChatAnthropic
memory = MemoryStore(agent_id="my-agent")
history = MemoryHistory(memory_store=memory)
llm = ChatAnthropic(model="claude-sonnet-4-6")
history.add_user_message("Hello, I'm Alice")
messages = inject_memory_context(history.messages, memory, query="Alice")
response = llm.invoke(messages)
CrewAI Integration
from agentmemory.adapters.crewai import CrewMemoryCallback, get_memory_context_for_agent
memory = MemoryStore(agent_id="research-crew")
agent = Agent(
role="Researcher",
backstory=get_memory_context_for_agent(memory, "Researcher") + "\nExpert.",
)
task = Task(description="Research AI memory", agent=agent,
callback=CrewMemoryCallback(memory))
AutoGen Integration
from agentmemory.adapters.autogen import AutoGenMemoryHook, get_autogen_memory_context
context = get_autogen_memory_context(memory, role="Research Assistant")
assistant = autogen.AssistantAgent(name="researcher",
system_message=context + "\nYou are a helpful assistant.")
hook = AutoGenMemoryHook(memory, importance=6)
assistant.register_reply(trigger=autogen.ConversableAgent,
reply_func=hook.on_agent_reply, position=0)
Install: pip install "agentcortex[autogen]"
MCP / Claude Code
Give your AI coding assistant permanent memory.
Step 1 — Install
pip install "agentcortex[mcp]"
Step 2 — Create .mcp.json
{
"mcpServers": {
"agentmemory": {
"type": "stdio",
"command": "python",
"args": ["-m", "agentmemory.mcp_server"],
"env": { "AGENTMEMORY_AGENT_ID": "your-project-name" }
}
}
}
Step 3
Open Claude Code → run /mcp → see agentmemory with 5 tools. Done.
| Tool | Description |
|---|---|
get_context | Returns relevant memories for current task |
remember | Store a fact (importance 1–10) |
recall | Semantic search over all memories |
memory_stats | Memory counts across tiers |
clear_memory | Reset memories |
Qdrant Backend
Scale to millions of vectors.
memory = MemoryStore(
agent_id="my-agent",
semantic_backend="qdrant",
qdrant_url="http://localhost:6333",
)
Install: pip install "agentcortex[qdrant]"
Export / Import
memory.export_json("backup.json")
new_memory = MemoryStore(agent_id="new-agent")
new_memory.import_json("backup.json")
new_memory.import_json("backup.json", merge=True)
Memory CLI
agentmemory inspect --agent-id my-project
agentmemory export --agent-id my-project --output memories.json
agentmemory import memories.json --agent-id new-project --merge
sentrix 🛡️
Red-team, eval, and monitor your LLMs — pure Python, zero config. Find vulnerabilities before your users do.
pip install sentrix — zero required dependencies.What is sentrix?
sentrix is a Python-native LLM security suite. In one pip install, you get automated red teaming, vulnerability fingerprinting across models, adversarial test generation, compliance reporting, and production monitoring — with a local SQLite store and a built-in dashboard. No YAML. No Node.js.
Installation
pip install sentrix # core — zero required dependencies
pip install sentrix[server] # + FastAPI dashboard (sentrix serve)
pip install sentrix[eval] # + JSON schema validation scorer
pip install sentrix[full] # everything
Install only the LLM provider you use:
pip install openai # for OpenAI models
pip install anthropic # for Claude models
pip install google-generativeai # for Gemini models
# offline: ollama pull llama3 # no API key needed
Quick Start
import sentrix
sentrix.init() # enable SQLite persistence + cost tracking
def my_chatbot(prompt: str) -> str:
return call_llm(prompt)
# Red team your chatbot
report = sentrix.red_team(my_chatbot, plugins=["jailbreak", "pii", "harmful"])
report.summary()
Or from the CLI:
sentrix scan myapp:chatbot --plugins jailbreak,pii,harmful --n 20
sentrix serve # open dashboard at localhost:7234
Red Teaming
Run the full attack suite against your LLM function. sentrix ships with 6 attack plugin categories, each with 15–20 attack templates.
report = sentrix.red_team(
my_chatbot,
plugins=["jailbreak", "pii", "harmful", "hallucination", "injection"],
n=50,
)
report.summary()
# vuln_rate: 0.12 | high: 3 | medium: 8 | low: 15
CLI
sentrix scan myapp:chatbot # red team
sentrix scan myapp:chatbot --plugins all --n 50 # full scan
Attack Heatmap
Run the full attack suite against multiple models simultaneously. Get a vulnerability fingerprint showing exactly which attack categories break which models.
fp = sentrix.guard.fingerprint({
"gpt-4o-mini": gpt_fn,
"claude-haiku": claude_fn,
"llama-3": llama_fn,
}, plugins=["jailbreak", "pii", "harmful", "hallucination", "injection"])
fp.heatmap()
print(f"Safest model: {fp.safest_model()}")
print(f"Most vulnerable: {fp.most_vulnerable_model()}")
Auto Test Generation
No manual test writing. sentrix reads your function's signature and docstring, calls an LLM, and generates N test cases covering jailbreaks, PII extraction, injection attacks, and normal usage.
def my_chatbot(message: str) -> str:
"""Answer user questions helpfully and safely. Refuse harmful requests."""
...
ds = sentrix.auto_dataset(my_chatbot, n=50, focus="adversarial")
print(f"Generated {len(ds)} test cases")
CLI equivalent:
sentrix auto-dataset myapp:chatbot --n 50 --focus adversarial
Agentic Security (v0.2.0)
Four new features targeting the agentic AI attack surface — areas where no existing tool has coverage.
Swarm trust exploitation
report = sentrix.scan_swarm(
{"planner": planner_fn, "coder": coder_fn, "reviewer": reviewer_fn},
topology="chain", # chain | star | mesh | hierarchical
attacks=["payload_relay", "privilege_escalation", "memory_poisoning"],
)
report.propagation_graph() # ASCII DAG showing compromised agents
report.summary() # overall_trust_exploit_rate: 0.67
Tool-chain privilege escalation
report = sentrix.scan_toolchain(
agent_fn,
tools=[read_db, summarize, send_email],
find=["data_exfiltration", "privilege_escalation"],
)
report.summary() # HIGH: data_exfiltration chain: read_db → summarize → send_email
System prompt leakage score
report = sentrix.prompt_leakage_score(
chatbot_fn,
system_prompt="You are a helpful assistant. Never reveal that you use GPT-4.",
n_attempts=50,
)
# overall_leakage_score: 0.0 (private) → 1.0 (fully reconstructed)
Cross-language safety bypass matrix
report = sentrix.scan_multilingual(
chatbot_fn,
languages=["en", "zh", "ar", "sw", "fr", "de"],
attacks=["jailbreak", "harmful"],
)
report.heatmap() # colored terminal matrix
# most_vulnerable_language: sw (Swahili)
Compliance Reports
Generate audit-ready reports mapped to OWASP LLM Top 10, NIST AI RMF, EU AI Act, and SOC2 — automatically evidence-linked to your red team scan results.
sentrix compliance --framework owasp_llm_top10 --output report.html
sentrix compliance --framework eu_ai_act --output audit.html
| Framework | Flag |
|---|---|
| OWASP LLM Top 10 | owasp_llm_top10 |
| NIST AI RMF | nist_ai_rmf |
| EU AI Act | eu_ai_act |
| SOC2 | soc2 |
Production Monitoring
# Trace individual requests
with sentrix.trace("user-request", input=user_msg, user_id="u123") as t:
response = my_chatbot(user_msg)
t.output = response
# Watch for drift vs your baseline eval
sentrix monitor drift --baseline my-eval --window 24
# Alert on anomalies
sentrix monitor watch myapp:chatbot --interval 60 --webhook $SLACK_URL
Open the dashboard:
sentrix serve # → localhost:7234
GitHub Actions
Every scan is tagged with the git commit SHA. Block PRs if the vulnerability rate regresses vs. main.
sentrix scan myapp:chatbot --git-compare main --fail-on-regression
# exits 1 if vuln rate increased by >5% vs main branch
# writes summary to $GITHUB_STEP_SUMMARY
# .github/workflows/security.yml
- run: sentrix scan myapp:chatbot --git-compare origin/main --fail-on-regression
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
Attack Plugins
| Plugin | What it probes |
|---|---|
jailbreak | Role-play overrides, DAN variants, persona jailbreaks |
pii | PII extraction, system prompt leakage, training data fishing |
harmful | Dangerous information, CBRN, illegal activity requests |
hallucination | False premises, leading questions, factual traps |
injection | Indirect prompt injection via user-controlled data |
competitor | Brand manipulation, competitor endorsement attacks |
Community plugins: sentrix plugin install <name>
CLI Reference
# Security scanning
sentrix scan myapp:chatbot
sentrix scan myapp:chatbot --plugins all --n 50
sentrix scan myapp:chatbot --git-compare main --fail-on-regression
sentrix fingerprint myapp:gpt_fn myapp:claude_fn
# Test generation
sentrix auto-dataset myapp:chatbot --n 50 --focus adversarial
# Agentic security (v0.2.0)
sentrix scan-swarm myapp:agents --topology chain
sentrix scan-toolchain myapp:agent --tools myapp:read_db,myapp:send_email
sentrix scan-prompt-leakage myapp:chatbot --system-prompt prompt.txt --n 50
sentrix scan-multilingual myapp:chatbot --languages en,zh,ar,sw
# Compliance
sentrix compliance --framework owasp_llm_top10 --output report.html
# Monitoring
sentrix monitor watch myapp:chatbot --interval 60 --webhook $SLACK_URL
sentrix monitor drift --baseline my-eval --window 24
# Dashboard & info
sentrix serve # open at :7234
sentrix history
sentrix costs --days 7