Production RAG. In 20 lines of TypeScript.
Ingest any document. Get cited, source-backed answers. Bring your own LLM — or run it fully local with Ollama. Open source and free forever.
$ npm install @aialchemy/rag-sdkAlso works with pnpm, yarn, and bun.
TypeScript-first · ESM · Node 22+ · MIT licensed
From zero to cited answers in 60 seconds
No framework to learn. No glue code. One factory, one ingest call, one query.
import { createRag } from '@aialchemy/rag-sdk';
const rag = await createRag({
mistral: { apiKey: process.env.MISTRAL_API_KEY! },
qdrant: { url: process.env.QDRANT_URL!, collection: 'my-docs' },
embeddings: {
provider: 'openai',
model: 'text-embedding-3-small',
apiKey: process.env.OPENAI_API_KEY!,
},
});
// Ingest a PDF — OCR, chunking, embedding, indexing. One call.
await rag.ingest.file('./contracts/agreement.pdf');
// Query with citations baked in.
const { matches } = await rag.retrieve('What are the termination clauses?');
console.log(matches[0].content);
console.log(matches[0].citation); // { sourceName, pageStart, pageEnd }Built for developers who ship
Three things the SDK does differently — and why they matter in production.
Citation-first answering
Every answer traces back to the source document and page range. No hallucinations, no "trust me bro." If the evidence isn't there, the SDK refuses to answer by default.
Bring your own LLM (or run it local)
OpenAI, Anthropic, Google Gemini, HuggingFace, or fully local with Ollama. Mix providers — e.g. Gemini for embeddings, Claude for answering. Swap with one config line.
Multi-tenant out of the box
Ship a SaaS knowledge base in a weekend. Per-tenant Qdrant collections with enforced metadata filters. Async job mode for production pipelines. Typed errors, telemetry hooks, Zod-validated config.
Supported formats
Start free. Scale when you're ready.
Same SDK across every tier. Upgrade when you're tired of running infra.
Self-Host
Free forever
For developers and open-source projects
- Full SDK, MIT licensed
- All file formats + OCR
- Any LLM provider (incl. Ollama)
- Multi-tenant support
- Community support via GitHub
- Unlimited everything (your infra)
npm install @aialchemy/rag-sdkView on GitHubManaged
A$5 / month
For teams that want to skip the ops
- Everything in Self-Host
- 1 GB vector storage
- 10,000 OCR pages / month
- 1 project
- Hosted vector DB + document storage
- One API key, zero infra
- Email support
Enterprise
Custom
For teams with scale, compliance, or custom needs
- Everything in Managed
- Dedicated infrastructure
- SSO, audit logs, custom SLA
- VPC deployment options
- Volume pricing on OCR + storage
- Dedicated support engineer
All prices in AUD. Managed service launching soon — waitlist members get priority access and early-bird pricing.
Managed service launching soon
Self-host today with the open-source SDK. Join the waitlist to be first on the managed service — same SDK, zero ops, from A$5/month.
Prefer to self-host? npm install @aialchemy/rag-sdk and you're done.
Frequently asked questions
Is it actually free?
Yes. The SDK is MIT licensed and free forever when you self-host. You only pay for your own LLM API calls and your own Qdrant and storage infrastructure. No seat limits, no usage caps, no free-tier gotchas.
What's the catch with the A$5 managed plan?
No catch — it is the entry price. You get 1 GB of vector storage, 10,000 OCR pages per month, and one project. When you outgrow it, pricing scales with your usage. The SDK itself is identical across self-host and managed, so there is no lock-in.
Can I migrate from self-host to managed later?
Yes. It is the same SDK and the same API. You swap the config to point at our managed endpoint — the rest of your code does not change.
Which LLM providers are supported?
OpenAI, Anthropic, Google Gemini, HuggingFace, and Ollama for local models. You can mix and match — for example, Gemini for embeddings and Claude for answer generation. Install only the providers you actually use.
Does it work without a cloud LLM at all?
Yes. Point embeddings and answering at a local Ollama instance and the SDK runs fully offline. Useful for privacy-sensitive data or air-gapped environments.
How does citation-backed answering work?
Every retrieved chunk carries the source document name and page range. When the SDK generates an answer, it refuses by default if the retrieved evidence is not sufficient. Configurable via refuse, warn, or allow modes.
What are the system requirements?
Node.js 22 or later. TypeScript-first but works in plain JavaScript. ESM-only. You will need a Mistral API key for OCR and a Qdrant instance (self-hosted or cloud).
Ship RAG this afternoon.
Open source. TypeScript-first. Free to self-host, A$5/month managed when you're ready.