Early access · Open Source · MIT

Production RAG. In 20 lines of TypeScript.

Ingest any document. Get cited, source-backed answers. Bring your own LLM — or run it fully local with Ollama. Open source and free forever.

$ npm install @aialchemy/rag-sdk

Also works with pnpm, yarn, and bun.

TypeScript-first · ESM · Node 22+ · MIT licensed

From zero to cited answers in 60 seconds

No framework to learn. No glue code. One factory, one ingest call, one query.

rag-quickstart.ts
import { createRag } from '@aialchemy/rag-sdk';

const rag = await createRag({
  mistral:    { apiKey: process.env.MISTRAL_API_KEY! },
  qdrant:     { url: process.env.QDRANT_URL!, collection: 'my-docs' },
  embeddings: {
    provider: 'openai',
    model: 'text-embedding-3-small',
    apiKey: process.env.OPENAI_API_KEY!,
  },
});

// Ingest a PDF — OCR, chunking, embedding, indexing. One call.
await rag.ingest.file('./contracts/agreement.pdf');

// Query with citations baked in.
const { matches } = await rag.retrieve('What are the termination clauses?');
console.log(matches[0].content);
console.log(matches[0].citation); // { sourceName, pageStart, pageEnd }

Built for developers who ship

Three things the SDK does differently — and why they matter in production.

Citation-first answering

Every answer traces back to the source document and page range. No hallucinations, no "trust me bro." If the evidence isn't there, the SDK refuses to answer by default.

Bring your own LLM (or run it local)

OpenAI, Anthropic, Google Gemini, HuggingFace, or fully local with Ollama. Mix providers — e.g. Gemini for embeddings, Claude for answering. Swap with one config line.

Multi-tenant out of the box

Ship a SaaS knowledge base in a weekend. Per-tenant Qdrant collections with enforced metadata filters. Async job mode for production pipelines. Typed errors, telemetry hooks, Zod-validated config.

Supported formats

PDF
DOCX
PPTX
PNG
JPEG
TIFF
WEBP
GIF
BMP
AVIF
TXT

Start free. Scale when you're ready.

Same SDK across every tier. Upgrade when you're tired of running infra.

Self-Host

Free forever

For developers and open-source projects

  • Full SDK, MIT licensed
  • All file formats + OCR
  • Any LLM provider (incl. Ollama)
  • Multi-tenant support
  • Community support via GitHub
  • Unlimited everything (your infra)
npm install @aialchemy/rag-sdk
View on GitHub
Most Popular

Managed

A$5 / month

For teams that want to skip the ops

  • Everything in Self-Host
  • 1 GB vector storage
  • 10,000 OCR pages / month
  • 1 project
  • Hosted vector DB + document storage
  • One API key, zero infra
  • Email support
Coming soon
Join the waitlist

Enterprise

Custom

For teams with scale, compliance, or custom needs

  • Everything in Managed
  • Dedicated infrastructure
  • SSO, audit logs, custom SLA
  • VPC deployment options
  • Volume pricing on OCR + storage
  • Dedicated support engineer
Contact sales

All prices in AUD. Managed service launching soon — waitlist members get priority access and early-bird pricing.

Managed service launching soon

Self-host today with the open-source SDK. Join the waitlist to be first on the managed service — same SDK, zero ops, from A$5/month.

No spam. We'll email you once when managed is live.

Prefer to self-host? npm install @aialchemy/rag-sdk and you're done.

Frequently asked questions

Is it actually free?

Yes. The SDK is MIT licensed and free forever when you self-host. You only pay for your own LLM API calls and your own Qdrant and storage infrastructure. No seat limits, no usage caps, no free-tier gotchas.

What's the catch with the A$5 managed plan?

No catch — it is the entry price. You get 1 GB of vector storage, 10,000 OCR pages per month, and one project. When you outgrow it, pricing scales with your usage. The SDK itself is identical across self-host and managed, so there is no lock-in.

Can I migrate from self-host to managed later?

Yes. It is the same SDK and the same API. You swap the config to point at our managed endpoint — the rest of your code does not change.

Which LLM providers are supported?

OpenAI, Anthropic, Google Gemini, HuggingFace, and Ollama for local models. You can mix and match — for example, Gemini for embeddings and Claude for answer generation. Install only the providers you actually use.

Does it work without a cloud LLM at all?

Yes. Point embeddings and answering at a local Ollama instance and the SDK runs fully offline. Useful for privacy-sensitive data or air-gapped environments.

How does citation-backed answering work?

Every retrieved chunk carries the source document name and page range. When the SDK generates an answer, it refuses by default if the retrieved evidence is not sufficient. Configurable via refuse, warn, or allow modes.

What are the system requirements?

Node.js 22 or later. TypeScript-first but works in plain JavaScript. ESM-only. You will need a Mistral API key for OCR and a Qdrant instance (self-hosted or cloud).

Ship RAG this afternoon.

Open source. TypeScript-first. Free to self-host, A$5/month managed when you're ready.