Mirrored from docs/VS_LITELLM.md. Edit there.

How `coracle` differs from LiteLLM

LiteLLM is an excellent, commercially-backed AI Gateway. We use its Python SDK as our provider abstraction layer (#8). But the product is a different thing entirely. This page exists so contributors and users can understand the line in 30 seconds.

TL;DR

	LiteLLM	`coracle`
Audience	Teams / orgs running paid LLM workloads at scale	One person on a laptop, paying nothing
Cost model	Pay-per-token via your provider keys	$0 — free tiers + local Ollama + headless-browser fallback
Topology	Stateless proxy / gateway	Stateful coracle with a job lifecycle
Inference location	Cloud-first; local is just-another-provider	Local-first; cloud is just-the-planner
RAM model	Server-class (32GB+); concurrency is the goal	16GB Mac M1; concurrency would crash the box
Request shape	One call → one provider → one response	One call → classify → consolidate → refine → big-AI plan → parse → local execute → verify
What sees the LLM	Every request	Only after a local classifier decides the request needs one
Tool execution	Returns the tool-call JSON; caller executes	Coracle executes tools (fs, shell, git, browser, MCP) inside a sandbox
Failure mode for "free quota exhausted"	Caller's problem	Automatic provider fallback, then headless-browser fallback to web UIs
Status / progress UX	None — proxy is stateless	First-class: status query never loads an LLM
MCP	Gateway: proxy upstream MCPs to LLMs (#45 parity here)	Same gateway shape plus coracle-as-MCP-server (#17) and config-driven MCP-client (#45)
A2A	First-class agent gateway	Out of scope (single-agent personal tool)
Multi-tenancy	Virtual keys, spend tracking, admin dashboard	Single-user; localhost-bound by default
Latency target	8 ms P95 at 1k RPS	Doesn't matter; correctness > throughput

What LiteLLM does that we deliberately do not do

Virtual keys / RBAC / admin dashboard — single-user tool, irrelevant
A2A agent protocol — out of scope
Embeddings, image, audio, batch, rerank endpoints — only chat completions for now (passthrough is tracked as #56 but P3)
Sub-10ms P95 routing — we are bottlenecked by classifier latency anyway
Enterprise guardrails (Lakera, Aporia, etc.) — we ship a thin local guardrail layer (#55), no commercial integrations
100+ providers — we curate ~5 free-tier providers; LiteLLM's SDK gets us the rest if anyone needs them

What LiteLLM does that we adopted (or should)

OpenAI-compatible API as the primary surface — yes (#11)
Drop-in base_url swap — yes
Provider abstraction — yes, via litellm SDK as a dependency (#8)
MCP gateway shape (tools[].type="mcp" in /v1/chat/completions) — yes, #56
Spend-equivalent observability — yes, but for free-tier quota not money — #54 (audit log) + #20 (quota tracking)
Guardrails / prompt-injection — yes, #55 (local-only, no SaaS dependencies)
Streaming SSE — yes (#11)

What we do that LiteLLM does not (and architecturally cannot)

Single-LLM-slot scheduler with hard RAM ceiling (#34, #33). LiteLLM never holds an LLM resident — it never had to solve this.
Two-model split (reasoning 7B + coder 7B, never co-resident) (#35, p3-*).
Local classifier auto-router — the user sees one model name; intent classification happens locally before any cloud call (#37).
Prompt-refinement pipeline — local model consolidates context and rewrites the prompt before posting upstream (#39, #40).
Headless-browser fallback to Claude.ai / ChatGPT / Gemini-web when API quotas are exhausted (#9). This is explicitly out of scope for any commercial gateway because it sits in a ToS grey area; for a personal tool it's fair use of the user's own session.
Job lifecycle with checkpoints — every coder step writes to SQLite before yielding the LLM slot, so a crash is recoverable (#21, #32).
Status without an LLM — three-tier status mode (DB-templated → 1.5B narrator → queued reasoning) so progress checks never force a 2nd 7B load (#12, #14, #16).
Sandboxed tool execution for fs/shell/git/browser (#26-#29).
Free-tier quota bookkeeping persisted across restarts — knows it has 1500 Gemini RPD and counts down (#20).
CC BY-NC-SA license — explicitly non-commercial. LiteLLM is MIT + Enterprise tier.

Bottom line

LiteLLM is the right answer if you have a budget, a fleet of paid LLM keys, and need ten thousand requests per second routed through one endpoint with audit and RBAC.

coracle is the right answer if you have one laptop, zero budget, two 7B local models, and want a coding agent that just works by spending free-tier credits intelligently and falling back to a browser when those run out.

They're complementary, not competing. We import litellm as a library; LiteLLM users will never need us.

TL;DR​

What LiteLLM does that we deliberately do not do​

What LiteLLM does that we adopted (or should)​

What we do that LiteLLM does not (and architecturally cannot)​

Bottom line​

TL;DR

What LiteLLM does that we deliberately do not do

What LiteLLM does that we adopted (or should)

What we do that LiteLLM does not (and architecturally cannot)

Bottom line