@mariozechner/pi-ai
Unified LLM Library
TypeScript library providing unified access to 20+ LLM providers with automatic model discovery, token tracking, cost monitoring, and context persistence.
Design constraint: pi-ai only includes models that support tool calling. This makes it ideal for agentic workflows but means you won't find raw text-completion-only models here.
Why this library exists
Every LLM provider has a slightly different API — and it's exhausting.
Anthropic uses messages with a content array. OpenAI uses the “chat completions” shape. Gemini wants yet another structure. Each supports tool-calling, but the JSON schema for how a tool is defined and how a tool call comes back is different for every one of them. Reasoning tokens, streaming events, cost fields — all different. Writing an agent from scratch against four providers means writing four agents.
pi-ai normalizes all of it into one TypeScript API. You write code once against the complete() / stream() interface. Behind the scenes, pi-ai translates to whichever provider you pointed at. Switch from Claude to GPT to Qwen with a one-line change — the rest of your agent doesn't notice.
In plain terms
TypeBox = a TypeScript library for describing JSON schemas with static types.
Type.Object({ location: Type.String() }) gives you both a validator at runtime
and a compile-time type. pi-ai uses TypeBox to define tool parameters — your tool definitions are type-safe end-to-end.
Unified reasoning (Extended Thinking): different providers expose reasoning differently — Anthropic has “extended thinking,” OpenAI has “o1/o3” reasoning tokens, Gemini has its own flavor. pi-ai boils them down to one knob:
reasoning: 'off' | 'low' | 'medium' | 'high'. Library handles provider-specific translation.
Streaming deltas = the model sends its response in chunks as it generates.
text_delta is a new piece of the visible reply;
toolcall_delta is a piece of a tool's JSON arguments as they're being built;
thinking_delta is a piece of the model's private reasoning. All arrive live;
done fires at the end.
openai-completions API dialect = the OpenAI chat-completions shape has become the de-facto standard. Ollama, LM Studio, vLLM, Together, Groq — they all speak it. So when you point pi-ai at
http://localhost:11434/v1 (Ollama), it just works.
Proxy URL = route requests through your own backend instead of calling the provider directly. Essential for browser apps where shipping the API key to the user's browser would leak it.
Supported Providers
OpenAI
Anthropic
Google Gemini
Google Vertex AI
Azure OpenAI
Amazon Bedrock
Mistral
Groq
Cerebras
xAI
OpenRouter
Vercel AI Gateway
MiniMax
Kimi / Moonshot
GitHub Copilot
Ollama / LM Studio / vLLM
Basic Usage
import { getModel, complete, stream } from '@mariozechner/pi-ai';
// Type-safe model discovery with auto-complete
const model = getModel('openai', 'gpt-4o-mini');
const context = {
systemPrompt: 'You are a helpful assistant.',
messages: [
{ role: 'user', content: 'Hello!' }
]
};
const result = await complete(model, context);
console.log(result.content, result.cost);
Streaming
for await (const event of stream(model, context)) {
switch (event.type) {
case 'text_delta':
process.stdout.write(event.delta);
break;
case 'toolcall_delta':
// Partial tool arguments as JSON stream
break;
case 'thinking_delta':
// Reasoning content
break;
case 'done':
console.log('Stop reason:', event.stopReason);
break;
}
}
Tool Calling with TypeBox
import { Type, StringEnum } from '@mariozechner/pi-ai';
const tools = [{
name: 'get_weather',
description: 'Get current weather',
parameters: Type.Object({
location: Type.String(),
units: StringEnum(['celsius', 'fahrenheit'])
})
}];
const result = await complete(model, { ...context, tools });
Extended Thinking
// Unified reasoning across Claude, GPT-5, Gemini 2.5
await completeSimple(model, context, {
reasoning: 'high' // 'off' | 'low' | 'medium' | 'high'
});
OAuth Login
import {
loginGitHubCopilot,
getOAuthApiKey
} from '@mariozechner/pi-ai/oauth';
const credentials = await loginGitHubCopilot({
onVerificationUri: (uri) => console.log('Visit:', uri)
});
const { apiKey } = await getOAuthApiKey('github-copilot', auth);
Streaming Events
| Event | Description |
| text_delta | Streamed response text chunk |
| toolcall_delta | Partial tool arguments (JSON streaming) |
| thinking_delta | Model reasoning content |
| done | Completion with stop reason and token usage |
| error | Failure with any partial content preserved |
Custom Endpoints
const ollamaModel = {
id: 'llama-3.1-8b',
api: 'openai-completions',
baseUrl: 'http://localhost:11434/v1',
reasoning: false,
input: ['text'],
cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
contextWindow: 128000,
maxTokens: 32000
};
Cross-provider handoffs: Switch models mid-conversation while preserving thinking blocks and tool results. Thinking automatically converts to <thinking> tagged text for providers that don't support native reasoning.
Walk-through
Switching from Claude to GPT mid-conversation without breaking anything
- step 1 You're 15 turns into a design session with Claude Opus. It's been carefully reasoning; thinking blocks are embedded in the history. You hit Claude's reasoning rate limit.
- step 2 You swap the model:
model = getModel('openai', 'gpt-4o'). Nothing else in your code changes.
- step 3 On the next
stream() call, pi-ai inspects the context. Claude-style thinking blocks don't map to GPT-4o's format — so pi-ai automatically converts them into <thinking>...</thinking> tagged plain text that GPT-4o can read as context.
- step 4 Claude's tool-call messages get reshaped to the OpenAI tool_calls format. Tool result messages get re-keyed.
- step 5 GPT-4o gets a coherent conversation history and responds. The streaming events that come back are the same
text_delta / done shape as Claude — your consumer doesn't know a switch happened.
- step 6 A few turns later, you swap to a local Ollama model via Custom Endpoint for privacy. pi-ai converts the OpenAI-shaped history to openai-completions shape (mostly identical) — Ollama picks up the thread.
- step 7 Total code changes across three provider hops: three strings. pi-ai absorbed everything else.
Common mistakes
- Using a model that doesn't support tool calling and wondering why tools never fire. pi-ai's registry filters to tool-capable models, but Custom Endpoint models you register yourself bypass that filter. Verify.
- Shipping API keys in a browser bundle. pi-ai works in the browser, but calls to provider APIs from the browser either CORS-fail or expose your key. Use the
proxyUrl option with a Worker or backend that holds the key server-side.
- Forgetting that
reasoning: 'high' costs real money. High-reasoning Claude/GPT runs can easily eat $0.10+ per turn. Track result.cost and set budgets.
- Assuming streaming events arrive in a specific order across providers. Most providers interleave
thinking_delta / text_delta, but timing varies. Write consumer code that handles any order.
- Defining tools without TypeBox. You can technically pass plain JSON Schema — but you lose the compile-time types pi-ai generates from TypeBox. Use TypeBox; future-you will appreciate the IntelliSense.
- Registering a Custom Endpoint with wrong
contextWindow. pi-ai uses that number to decide when to warn about overflow. Too high = silent truncation. Check the model's real context window and enter it accurately.