coracle

A personal-machine AI coracle that intelligently splits work between free-tier "big" cloud AI (planning) and local Ollama models (reasoning + execution), without ever spiking RAM enough to crash the machine. Built to be consumed as a drop-in OpenAI-compatible "model" by opencode, Claude Code, codex, Cursor, Continue, etc.

Why this exists

Big AI models are great at planning. Small local models are great at executing. Free API tiers run out. Browser-driven web AIs are flaky. RAM on a 16GB Mac is precious. None of the existing tools combine all of these gracefully — so this one does:

Resident reasoning model (qwen2.5:7b) classifies every request and routes it to the right pipeline.
Big AI (Gemini, Groq, Ollama Cloud, headless-browser fallback to Claude.ai/ChatGPT/Gemini-web) handles deep planning when the classifier asks for it.
Coder model (qwen2.5-coder:7b) executes steps locally with a full tool belt (fs, shell, web, browser, git).
Single-LLM-slot scheduler ensures only one 7B model is in RAM at a time.
SQLite job state powers instant status responses with zero RAM cost.
One model name to the consumer: coracle. Auto-routing is invisible.

Architecture at a glance

opencode / Claude Code / codex
            │  (OpenAI-compatible /v1/chat/completions)
            ▼
┌─────────────────────────────────────────────────────────────┐
│ Resident reasoning model (qwen2.5:7b) — CLASSIFIER          │
│  → fast | deep | research | status                           │
└─────────────────────────────────────────────────────────────┘
            │
   ┌────────┼────────┬─────────────────┐
   ▼        ▼        ▼                 ▼
 status   fast      deep             research
 (DB     (local-   (reason →         (deep + web
  read)  only)     big AI →          tools biased)
                   parse →
                   coder →
                   verify)

Full design details: docs/PLAN.md.

How is this different from LiteLLM?

Short version: LiteLLM is a paid-API gateway built for throughput; coracle is a personal-machine scheduler built for $0 budgets and a 16GB RAM ceiling. We use LiteLLM's SDK as our provider abstraction, but the product is a different thing entirely — see docs/VS_LITELLM.md for the full table.

	LiteLLM	`coracle`
Cost model	Pay-per-token	$0 — free tiers + local + headless-browser fallback
Topology	Stateless proxy	Stateful job coracle
Inference	Cloud-first	Local-first
RAM target	Server-class	16GB Mac M1
Tool execution	Caller's job	Coracle runs the tools (sandbox + MCP)
Status / progress	None	First-class, never loads an LLM

Status

🚧 Pre-alpha — implementation underway.

Skeleton (package layout, settings loader, structured logging) landed in #31.

Issues are organized into 7 phases (Phase 1 → Phase 7) tracked via GitHub Milestones. Each phase has an Epic issue summarizing scope and linking to its sub-tasks.

This project is agent-friendly: every issue contains enough context, acceptance criteria, file paths, and definition-of-done that a coding agent (or human contributor) can pick it up cold, clone the repo, and submit a PR.

How to contribute (humans and agents)

Pick a ready issue (label: status:ready) — these have no unresolved dependencies.
Read the issue's Context, Acceptance Criteria, and Definition of Done.
Reference docs/PLAN.md for the bigger picture.
Open a PR linking the issue (Closes #N).
Follow CONTRIBUTING.md.
PRs are reviewed by a layered AI bot stack — see docs/REVIEW_BOTS.md. Only our strict code-reviewer-001 bot has merge authority; it waits for the AI bots to weigh in before approving.

Tech stack

Concern	Choice
Language	Python 3.11+
Local models	Ollama (`qwen2.5:7b`, `qwen2.5-coder:7b`)
Big AI providers	`litellm` → Gemini, Groq, Ollama Cloud + Playwright headless fallback
External interface	OpenAI-compatible HTTP (primary) + MCP stdio + native HTTP + CLI
Server	FastAPI + Uvicorn
State	SQLite
Browser	Playwright (headless, separate subprocess per provider)
RAM monitor	psutil

Hardware target

Mac M1 Pro, 16 GB RAM. Designed to never exceed ~11 GB resident.

Wiring external MCP servers

The coracle can consume any number of remote/cloud MCP servers as local tools. Copy the example config and edit it:

cp config/mcp_servers.yaml.example config/mcp_servers.yaml
# edit config/mcp_servers.yaml — supports stdio | http | sse transports
coracle mcp list      # show connected servers + tool counts
coracle mcp reload    # re-read the config without restarting

Environment variables in the config (e.g. ${GITHUB_TOKEN}) are expanded at load time, so secrets stay out of source control.

License

Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0).

You are free to share and adapt the material under these terms:

Attribution — credit the original author and link to the license.
NonCommercial — no commercial use.
ShareAlike — distribute derivative works under the same license.

See LICENSE for the full legal text.

Why this exists​

Architecture at a glance​

How is this different from LiteLLM?​

Status​

How to contribute (humans and agents)​

Tech stack​

Hardware target​

Wiring external MCP servers​

License​