04 — LLM Providers & Account Management
Connect your AI accounts, set up failover, never hit rate limits.
Quick Start — Which provider should I use?
| If you want... | Use this | Cost |
|---|---|---|
| Cheapest possible | Claude CLI (Max subscription) | $20/month flat, unlimited |
| Best quality + speed | Anthropic API | ~$3-15/MTok |
| Access to everything | OpenRouter | Varies by model |
| Free + private | Ollama (local) | $0 (runs on your hardware) |
| Mix of all | All of the above | Arvis auto-rotates between them |
The recommended setup: Claude CLI as primary (cheap, unlimited) + one API key as backup.
Provider Setup Guides
1. Claude CLI (Max Subscription — $20/month, unlimited)
Use your Claude Pro/Max subscription with Arvis. No per-token cost — just your monthly subscription. This is the cheapest way to run Arvis.
Setup (all platforms — Windows, Mac, Linux)
Step 1: Install Claude Code
npm install -g @anthropic-ai/claude-code
Step 2: Add an account
npm run add-account
A browser opens — log in with your Claude account. Auth files save to data/accounts/acc1/. Done.
Step 3: Add more accounts (optional — for zero rate limits)
npm run add-account work
npm run add-account personal
Log in with a different Claude account each time. Each gets its own folder in data/accounts/.
That's it. No .env changes needed — Arvis auto-detects all accounts in data/accounts/ on startup.
Session expired? Re-login:
# Just run add-account again with the same name — it detects existing auth and re-authenticates
npm run add-account acc1
Advanced: Manual env var setup
If you prefer managing account directories yourself:
CLAUDE_CLI_HOME=/path/to/claude-home-dir
CLAUDE_CLI_HOME_1=/path/to/second-account
CLAUDE_CLI_HOME_2=/path/to/third-account
Good to know
- Each account = one Claude Pro/Max subscription
npm run add-accountauto-names accountsacc1,acc2,acc3... or you can pass a custom name- 3 accounts = practically unlimited messages (rate limits rotate between them)
- Auth folders contain tokens — don't share them or commit them to git
- On a headless VPS (no browser), the login prints a URL — open it on your phone/laptop to complete
2. Anthropic API (pay-per-token)
Best for: fast responses, full tool support, precise cost control.
Get your key: console.anthropic.com → API Keys → Create Key
ANTHROPIC_API_KEY=sk-ant-api03-xxxxx
Multiple keys (for higher rate limits):
ANTHROPIC_API_KEY=sk-ant-main-key
ANTHROPIC_API_KEY_1=sk-ant-backup-key
ANTHROPIC_API_KEY_2=sk-ant-team-key
Available models:
| Model | Speed | Quality | Cost (input/output per MTok) |
|---|---|---|---|
claude-opus-4-6 |
Slow | Best | $15 / $75 |
claude-sonnet-4-6 |
Fast | Great | $3 / $15 |
claude-haiku-4-5-20251001 |
Fastest | Good | $0.80 / $4 |
Agent model spec: claude-sonnet-4-6 or anthropic/claude-sonnet-4-6
3. OpenAI (pay-per-token)
Get your key: platform.openai.com → Create new secret key
OPENAI_API_KEY=sk-xxxxx
Multiple keys:
OPENAI_API_KEY=sk-main
OPENAI_API_KEY_1=sk-backup
Available models:
| Model | Speed | Quality | Cost (input/output per MTok) |
|---|---|---|---|
gpt-4.1 |
Fast | Great | $2 / $8 |
gpt-4.1-mini |
Fastest | Good | $0.40 / $1.60 |
gpt-4.1-nano |
Instant | Basic | $0.10 / $0.40 |
o4-mini |
Slow | Reasoning | $1.10 / $4.40 |
Agent model spec: openai/gpt-4.1-mini
4. OpenRouter (one key, all models)
Best for: trying different models without managing multiple API keys.
Get your key: openrouter.ai → Create Key
OPENROUTER_API_KEY=sk-or-xxxxx
Access to 200+ models: Claude, GPT-4, Gemini, Llama, DeepSeek, Qwen, Mistral, and more. Pricing varies per model.
Agent model spec: openrouter/claude-sonnet-4-6 or openrouter/meta-llama/llama-4-maverick
5. Google Gemini (pay-per-token)
Get your key: aistudio.google.com → Create API Key
GOOGLE_API_KEY=AIzaSyxxxxx
Available models:
| Model | Speed | Quality | Cost (input/output per MTok) |
|---|---|---|---|
gemini-2.5-pro |
Medium | Great | $1.25 / $10 |
gemini-2.5-flash |
Fast | Good | $0.30 / $2.50 |
gemini-2.0-flash-lite |
Fastest | Basic | Free tier available |
Agent model spec: google/gemini-2.5-flash
6. Ollama (free, runs locally)
Best for: privacy, offline use, zero cost. Requires a decent GPU.
Install: ollama.com/download
# Pull a model
ollama pull llama3
# or
ollama pull qwen2.5-coder:7b
OLLAMA_BASE_URL=http://localhost:11434
Agent model spec: ollama/llama3
7. Custom OpenAI-Compatible Provider
Any service with an OpenAI-compatible API (Together AI, Groq, Fireworks, etc.):
CUSTOM_BASE_URL=https://api.together.xyz/v1
CUSTOM_API_KEY=your-key
How Arvis Picks Which Account to Use
You don't have to think about this — it's automatic. But here's what happens:
Priority order (tries top to bottom)
| Priority | Provider | Why |
|---|---|---|
| 10 | Claude CLI | Free with subscription |
| 20 | Anthropic API | Fast + best Claude models |
| 50 | OpenAI | Good alternative |
| 60 | OpenRouter | Access to everything |
| 70 | Google Gemini | Solid backup |
| 200 | Ollama | Free, local, last resort |
Within the same provider, the account with the fewest messages used is picked (load balancing).
When one gets rate limited
Arvis automatically switches to the next available account. You don't see anything — maybe a slightly delayed response.
| Retry | Cooldown before retrying that account |
|---|---|
| 1st | 1 minute |
| 2nd | 5 minutes |
| 3rd | 25 minutes |
| 4th+ | 60 minutes (cap) |
If ALL accounts are exhausted, you'll see: "All AI accounts are rate-limited. Retrying automatically in ~X minutes."
Smart model selection
When Arvis has to use a backup provider, it picks the right size model:
- Short/simple messages ("hi", "what time is it") → cheap model (haiku/mini/flash)
- Long/complex messages ("analyze this code", "design a system") → powerful model (sonnet/opus/pro)
Recommended Setups
Budget setup (free or almost free)
# Just one Claude CLI account — $20/month, covers most usage
npm run add-account
# Optional: free local backup
OLLAMA_BASE_URL=http://localhost:11434
Standard setup (reliable, low cost)
# Primary: 2 Claude CLI accounts (auto-detected, no env vars needed)
npm run add-account
npm run add-account backup
# Backup: Anthropic API with cheap model
ANTHROPIC_API_KEY=sk-ant-xxxxx
Power setup (never rate limited, best quality)
# 3 Claude CLI accounts
npm run add-account
npm run add-account work
npm run add-account personal
# API backup
ANTHROPIC_API_KEY=sk-ant-main
ANTHROPIC_API_KEY_1=sk-ant-backup
# Fallback
OPENAI_API_KEY=sk-xxxxx
# Last resort
OLLAMA_BASE_URL=http://localhost:11434
Setting Models Per Agent
In Dashboard → Agents → click your agent → Config tab:
- Primary model:
anthropic/claude-sonnet-4-6 - Fallbacks:
openrouter/claude-sonnet-4-6, openai/gpt-4.1-mini
Or tell the Conductor: "Set dev-agent's model to claude-sonnet-4-6 with openai fallback"
Cost Tracking
Every API call is logged automatically. View in Dashboard → Usage page.
CLI accounts always show $0 cost (subscription model). API accounts show exact per-token costs.
You can disable expensive accounts in Dashboard → Settings → Accounts without removing them from .env.
Advanced: How Failover Works Internally
Request comes in for agent with model="claude-sonnet-4-6"
│
├─ STAGE 1: Try preferred provider (anthropic)
│ → Pick account with lowest usage that isn't rate-limited
│ → If found: use it. Done.
│
├─ STAGE 2: Try fallback chain from agent config
│ → agent.modelFallbacks = ["openrouter/claude-sonnet-4-6", "openai/gpt-4.1"]
│ → Try each in order until one works
│
└─ STAGE 3: Any account at all
→ classifyComplexity(prompt) → pick appropriate model size
→ If nothing available: queue for retry with backoff
Advanced: Per-Provider API Details
| Provider | Endpoint | Auth | Tool format |
|---|---|---|---|
| Anthropic | api.anthropic.com/v1/messages |
x-api-key header |
tool_use content blocks |
| OpenAI / OpenRouter / Ollama | /v1/chat/completions |
Bearer token |
function_call messages |
| Google Gemini | generativelanguage.googleapis.com |
key= query param |
functionCall / functionResponse |
Advanced: Tool Call Loop
When the LLM wants to use a tool (web_search, http_fetch, etc.), the Provider Runner automatically handles multi-turn tool calls — up to 5 rounds per request. The CLI runner delegates tool use to Claude Code's built-in tools (bash, file access, etc.).