Early access · Open Source · MIT

Production RAG. In 20 lines of TypeScript.

Ingest any document. Get cited, source-backed answers. Bring your own LLM — or run it fully local with Ollama. Open source and free forever.

$ npm install @aialchemy/rag-sdk

Also works with pnpm, yarn, and bun.

View on GitHub Read the Docs See pricing ↓

TypeScript-first · ESM · Node 22+ · MIT licensed

From zero to cited answers in 60 seconds

No framework to learn. No glue code. One factory, one ingest call, one query.

rag-quickstart.ts

import { createRag } from '@aialchemy/rag-sdk';

const rag = await createRag({
  mistral:    { apiKey: process.env.MISTRAL_API_KEY! },
  qdrant:     { url: process.env.QDRANT_URL!, collection: 'my-docs' },
  embeddings: {
    provider: 'openai',
    model: 'text-embedding-3-small',
    apiKey: process.env.OPENAI_API_KEY!,
  },
});

// Ingest a PDF — OCR, chunking, embedding, indexing. One call.
await rag.ingest.file('./contracts/agreement.pdf');

// Query with citations baked in.
const { matches } = await rag.retrieve('What are the termination clauses?');
console.log(matches[0].content);
console.log(matches[0].citation); // { sourceName, pageStart, pageEnd }

More examples on GitHub

Built for developers who ship

Three things the SDK does differently — and why they matter in production.

Citation-first answering

Every answer traces back to the source document and page range. No hallucinations, no "trust me bro." If the evidence isn't there, the SDK refuses to answer by default.

Bring your own LLM (or run it local)

OpenAI, Anthropic, Google Gemini, HuggingFace, or fully local with Ollama. Mix providers — e.g. Gemini for embeddings, Claude for answering. Swap with one config line.

Multi-tenant out of the box

Ship a SaaS knowledge base in a weekend. Per-tenant Qdrant collections with enforced metadata filters. Async job mode for production pipelines. Typed errors, telemetry hooks, Zod-validated config.

Supported formats

PDF

DOCX

PPTX

PNG

JPEG

TIFF

WEBP

GIF

BMP

AVIF

TXT

Start free. Scale when you're ready.

Same SDK across every tier. Upgrade when you're tired of running infra.

Self-Host

Free forever

For developers and open-source projects

Full SDK, MIT licensed
All file formats + OCR
Any LLM provider (incl. Ollama)
Multi-tenant support
Community support via GitHub
Unlimited everything (your infra)

npm install @aialchemy/rag-sdk

View on GitHub

A$5 / month

For teams that want to skip the ops

Everything in Self-Host
1 GB vector storage
10,000 OCR pages / month
1 project
Hosted vector DB + document storage
One API key, zero infra
Email support

Coming soon

Join the waitlist

Enterprise

Custom

For teams with scale, compliance, or custom needs

Everything in Managed
Dedicated infrastructure
SSO, audit logs, custom SLA
VPC deployment options
Volume pricing on OCR + storage
Dedicated support engineer

Contact sales

All prices in AUD. Managed service launching soon — waitlist members get priority access and early-bird pricing.

Managed service launching soon

Self-host today with the open-source SDK. Join the waitlist to be first on the managed service — same SDK, zero ops, from A$5/month.

Prefer to self-host? npm install @aialchemy/rag-sdk and you're done.

Frequently asked questions

Is it actually free?

Yes. The SDK is MIT licensed and free forever when you self-host. You only pay for your own LLM API calls and your own Qdrant and storage infrastructure. No seat limits, no usage caps, no free-tier gotchas.

What's the catch with the A$5 managed plan?

No catch — it is the entry price. You get 1 GB of vector storage, 10,000 OCR pages per month, and one project. When you outgrow it, pricing scales with your usage. The SDK itself is identical across self-host and managed, so there is no lock-in.

Can I migrate from self-host to managed later?

Yes. It is the same SDK and the same API. You swap the config to point at our managed endpoint — the rest of your code does not change.

Which LLM providers are supported?

OpenAI, Anthropic, Google Gemini, HuggingFace, and Ollama for local models. You can mix and match — for example, Gemini for embeddings and Claude for answer generation. Install only the providers you actually use.

Does it work without a cloud LLM at all?

Yes. Point embeddings and answering at a local Ollama instance and the SDK runs fully offline. Useful for privacy-sensitive data or air-gapped environments.

How does citation-backed answering work?

Every retrieved chunk carries the source document name and page range. When the SDK generates an answer, it refuses by default if the retrieved evidence is not sufficient. Configurable via refuse, warn, or allow modes.

What are the system requirements?

Node.js 22 or later. TypeScript-first but works in plain JavaScript. ESM-only. You will need a Mistral API key for OCR and a Qdrant instance (self-hosted or cloud).

Ship RAG this afternoon.

Open source. TypeScript-first. Free to self-host, A$5/month managed when you're ready.

npm install @aialchemy/rag-sdk View on GitHub Join managed waitlist