Open source · MIT · v1.1.2

Stop renting
intelligence.

Own it forever.

Crasis distills a frontier model's understanding of your task into a tiny, local ONNX specialist. Train once. Deploy everywhere. Zero API cost, zero latency, zero data leaving your device.

$ pip install crasis click to copy
GitHub
4MB
typical model size
<3ms
inference latency
$1.28
to build 10 specialists
$0.00
per query, forever

// how it works

The frontier model trains it.
Your hardware runs it.

Crasis calls a frontier model exactly once — to understand your task and generate labeled training data. That intelligence is distilled into a tiny encoder model. After that, the frontier model is never called again.

01

Write a spec

Describe your task in plain English. A spec is not code — it's a contract. Define what a positive example looks like, what to ignore, and your quality bar.

02

Generate training data

Crasis calls OpenRouter with enforce_distillable_text: true — routing only to models whose licenses explicitly permit distillation. Clean provenance on every sample.

03

Train locally

BERT-class distillation on your GPU. An RTX 4060 completes a binary specialist in under 30 minutes. CPU-only training is supported — it just takes longer.

04

Export to ONNX. Deploy anywhere.

The output is a single ONNX file. Runs on laptops, Raspberry Pi, Jetson, mobile. ONNX Runtime is available in every language. No GPU required at inference time.

05

Improve with real data

When you have real labeled examples, crasis mix blends them with synthetic data at a configurable weight and retrains. The gap between synthetic and real accuracy closes fast.

terminal
$crasis build --spec specs/refund-detector.yaml
▶ Generating 5,000 training samples via OpenRouter...
▶ Training BERT-Tiny on cuda:0 — epoch 10/10
▶ Eval: accuracy 0.9920 · f1 0.9914 · PASSED
▶ Exporting to ONNX (int8 quantization)...
✓ refund-detector-onnx — 4.3 MB · 0.55ms · ready
$crasis classify --model ./models/refund-detector-onnx \
"I want my money back, this is ridiculous"
{ label: "positive", confidence: 0.97, latency_ms: 0.43 }

// pre-built specialists

Ten specialists. Ready now.

The tasks people most commonly pay frontier models to handle. Pull, deploy, never pay for them again. Accuracy is reported as both synthetic and holdout — see SCORECARD.md for full results.

whatsapp-triage
Pricing / availability inquiry detector
4.3 MB · 0.56ms
field-ready
invoice-intent
Payment / billing message detection
4.3 MB · 0.55ms
field-ready
pricing-detector
Is this message asking about cost?
4.3 MB · 0.57ms
field-ready
availability-handler
Scheduling request → calendar link trigger
4.3 MB · 0.62ms
field-ready
support-router
Multi-class ticket categorization (5 classes)
4.3 MB · 0.57ms
field-ready
meeting-parser
Extract who/when/what from scheduling messages
4.3 MB · 0.55ms
field-ready
social-classifier
Is this mention worth responding to?
4.3 MB · 0.56ms
field-ready
sentiment-gate
High-arousal anger detection
4.3 MB · 0.56ms
field-ready
email-urgency *
Reply-now vs read-later (4 classes)
10.8 MB · 2.79ms
experimental
spam-filter *
Personalizable noise classifier
10.8 MB · 2.67ms
experimental

* Experimental: holdout accuracy below 75% indicates synthetic-to-real distribution shift. Use crasis mix to improve with your own data.


// vs frontier api

The honest comparison.

Frontier models are brilliant generalists. For bounded classification tasks, that's overkill.

Metric Frontier API Crasis Specialist
Model size 4GB+ 4–11MB
Cost per query $0.001–0.01 $0.00
Inference latency 2–5 seconds <3ms on CPU
Works offline No Yes
Data leaves device Always Never
Accuracy on narrow tasks ~97% See SCORECARD
Cost trend over time Flat (toll) Already free

// get started

One command.

Inference only. No PyTorch. No GPU. Just ONNX Runtime and a 4MB model.

$ pip install crasis click to copy
View on GitHub →

Full pipeline: pip install crasis[train]

Want the hosted pipeline? Crasis Studio →