Open source · MIT · v1.1.2

Stop renting
intelligence.

Own it forever.

Crasis distills a frontier model's understanding of your task into a tiny, local ONNX specialist. Train once. Deploy everywhere. Zero API cost, zero latency, zero data leaving your device.

$ pip install crasis click to copy

GitHub

4MB

typical model size

<3ms

inference latency

$1.28

to build 10 specialists

$0.00

per query, forever

// how it works

The frontier model trains it.
Your hardware runs it.

Crasis calls a frontier model exactly once — to understand your task and generate labeled training data. That intelligence is distilled into a tiny encoder model. After that, the frontier model is never called again.

Write a spec

Describe your task in plain English. A spec is not code — it's a contract. Define what a positive example looks like, what to ignore, and your quality bar.

Generate training data

Crasis calls OpenRouter with enforce_distillable_text: true — routing only to models whose licenses explicitly permit distillation. Clean provenance on every sample.

Train locally

BERT-class distillation on your GPU. An RTX 4060 completes a binary specialist in under 30 minutes. CPU-only training is supported — it just takes longer.

Export to ONNX. Deploy anywhere.

The output is a single ONNX file. Runs on laptops, Raspberry Pi, Jetson, mobile. ONNX Runtime is available in every language. No GPU required at inference time.

Improve with real data

When you have real labeled examples, crasis mix blends them with synthetic data at a configurable weight and retrains. The gap between synthetic and real accuracy closes fast.

terminal

$crasis build --spec specs/refund-detector.yaml

▶ Generating 5,000 training samples via OpenRouter...

▶ Training BERT-Tiny on cuda:0 — epoch 10/10

▶ Eval: accuracy 0.9920 · f1 0.9914 · PASSED

▶ Exporting to ONNX (int8 quantization)...

✓ refund-detector-onnx — 4.3 MB · 0.55ms · ready

$crasis classify --model ./models/refund-detector-onnx \

"I want my money back, this is ridiculous"

{ label: "positive", confidence: 0.97, latency_ms: 0.43 }

// pre-built specialists

Ten specialists. Ready now.

The tasks people most commonly pay frontier models to handle. Pull, deploy, never pay for them again. Accuracy is reported as both synthetic and holdout — see SCORECARD.md for full results.

whatsapp-triage

Pricing / availability inquiry detector

4.3 MB · 0.56ms

field-ready

invoice-intent

Payment / billing message detection

4.3 MB · 0.55ms

field-ready

pricing-detector

Is this message asking about cost?

4.3 MB · 0.57ms

field-ready

availability-handler

Scheduling request → calendar link trigger

4.3 MB · 0.62ms

field-ready

support-router

Multi-class ticket categorization (5 classes)

4.3 MB · 0.57ms

field-ready

meeting-parser

Extract who/when/what from scheduling messages

4.3 MB · 0.55ms

field-ready

social-classifier

Is this mention worth responding to?

4.3 MB · 0.56ms

field-ready

sentiment-gate

High-arousal anger detection

4.3 MB · 0.56ms

field-ready

email-urgency *

Reply-now vs read-later (4 classes)

10.8 MB · 2.79ms

experimental

spam-filter *

Personalizable noise classifier

10.8 MB · 2.67ms

experimental

* Experimental: holdout accuracy below 75% indicates synthetic-to-real distribution shift. Use crasis mix to improve with your own data.

// vs frontier api

The honest comparison.

Frontier models are brilliant generalists. For bounded classification tasks, that's overkill.

Metric	Frontier API	Crasis Specialist
Model size	4GB+	4–11MB
Cost per query	$0.001–0.01	$0.00
Inference latency	2–5 seconds	<3ms on CPU
Works offline	No	Yes
Data leaves device	Always	Never
Accuracy on narrow tasks	~97%	See SCORECARD
Cost trend over time	Flat (toll)	Already free

// get started

One command.

Inference only. No PyTorch. No GPU. Just ONNX Runtime and a 4MB model.

$ pip install crasis click to copy

View on GitHub →

Full pipeline: pip install crasis[train]

Want the hosted pipeline? Crasis Studio →

Stop rentingintelligence.

The frontier model trains it.Your hardware runs it.

Write a spec

Generate training data

Train locally

Export to ONNX. Deploy anywhere.

Improve with real data

Ten specialists. Ready now.

The honest comparison.

One command.

Stop renting
intelligence.

The frontier model trains it.
Your hardware runs it.