Reor: Local AI Knowledge Management Built on RAG

Reor is an Electron desktop app for personal knowledge management. Every note you write gets chunked, embedded, and stored in a local LanceDB vector database. Related notes surface automatically through vector similarity. LLMs answer questions over your corpus via RAG. Nothing leaves your machine.

Why I starred it

Most "AI note-taking" tools send your notes to a remote API. Reor doesn't. It runs Transformers.js models in the Electron main process — no Python server, no cloud calls. The entire inference stack runs in Node via WASM. That's a specific architectural choice with real tradeoffs, and I wanted to understand how far they pushed it.

The framing in the README is sharp: "a RAG app with two generators — the LLM and the human." In Q&A mode, the LLM is fed retrieved context. In editor mode, the sidebar shows related notes retrieved from the same corpus. The human and the model both consume the same retrieval pipeline.

How it works

The retrieval stack lives in electron/main/vector-database/. Here's what happens when you open a vault:

Chunking (electron/main/common/chunking.ts) splits each markdown file first by headings, then uses LangChain's RecursiveCharacterTextSplitter on chunks that exceed a configurable size threshold:

export const chunkMarkdownByHeadingsAndByCharsIfBig = async (markdownContent: string): Promise<string[]> => {
  const chunksByHeading = chunkMarkdownByHeadings(markdownContent)
  chunksByHeading.forEach((chunk) => {
    if (chunk.length > chunkSize) {
      chunksWithBigChunksSplit.push(chunk)
    } else {
      chunksWithSmallChunksSplit.push(chunk)
    }
  })
  const chunkedRecursively = await chunkStringsRecursively(chunksWithBigChunksSplit, chunkSize, chunkOverlap)
  return chunksWithSmallChunksSplit.concat(chunkedRecursively)
}

The two-pass strategy preserves semantic boundaries from heading structure, then only falls back to character splitting when a section is too large to embed cleanly. Most RAG implementations skip the heading-aware first pass entirely.

Embedding runs via @xenova/transformers in the main process. embeddings.ts supports multiple Hugging Face models — UAE-Large-V1 for English, bge-small for low-power devices, multilingual-e5-large for non-English, and two Jina models for Chinese and German. The createEmbeddingFunction in embeddings.ts returns an EnhancedEmbeddingFunction that wraps the embedding call in a reranker-compatible interface. After initial retrieval, a bge-reranker-base cross-encoder rescores results:

export const rerankSearchedEmbeddings = async (query: string, searchResults: DBEntry[]) => {
  const tokenizer = await AutoTokenizer.from_pretrained('Xenova/bge-reranker-base')
  const model = await AutoModelForSequenceClassification.from_pretrained('Xenova/bge-reranker-base')
  const scores = await model(inputs)
  return resultsWithIndex.sort((a, b) => b.score - a.score).filter((item) => item.score > 0)
}

Two-stage retrieval (ANN + reranker) is standard in production RAG systems. Seeing it in a desktop app running WASM-compiled cross-encoders is genuinely unusual.

Storage uses vectordb (LanceDB) at version 0.4.10. The table name encodes both the embedding model and vault path (ragnote_table_<model>_<vault>), so switching embedding models creates a fresh table rather than corrupting an existing one. Schema migrations are handled in lance.ts by comparing the serialized Arrow schema — if the stored schema doesn't match the expected schema, it drops and recreates the table automatically.

Context window management (electron/main/llm/contextLimit.ts) deserves a look. Instead of truncating at a token limit, createPromptWithContextLimitFromContent walks retrieved chunks line by line, accumulates token counts via the tokenizer, and stops at 90% of the model's context length. It also captures the first line that didn't fit as contextCutoffAt so the caller can surface it to the user:

if (lineTokens + tokenCount < contextLimit * 0.9) {
  return { contentArray: [..._contentArray, lineWithNewLine], ... }
}
if (_cutOffLine.length === 0) {
  return { contentArray: _contentArray, tokenCount, cutOffLine: lineWithNewLine }
}

The incremental diff check in tableHelperFunctions.ts is also solid: RepopulateTableWithMissingItems compares filemodified timestamps between the LanceDB table and the filesystem before re-embedding anything, so vault indexing on reopen is near-instant once the initial run completes.

Using it

Download from the releases page or build from source:

git clone https://github.com/reorproject/reor.git
npm install
npm run dev

On first launch you point it at a directory of markdown files. Reor indexes everything. In the editor, a sidebar shows the top related notes as you type. The Q&A panel lets you ask questions over the vault — the response cites which notes provided context.

For LLMs, you can connect to Ollama (it talks directly to the Ollama API, not just OpenAI-compatible pass-through), or any OpenAI-compatible endpoint. Reor also downloads and runs models fully locally via the Transformers.js pipeline, no Ollama needed.

Rough edges

The dependency list is a mess. package.json pulls in Tamagui, MUI, Mantine, Radix UI, and Material Tailwind simultaneously — four component libraries in the same app. The bundle is going to be large. There's also react-native as a dependency in an Electron app, which is specifically because Tamagui supports RN targets and drags it along.

There are two test files (database.test.ts, filesystem.test.ts) but no test runner is configured in package.json. The tests exist but don't run in CI. The git history shows rapid iteration — 8k+ stars accumulated quickly, and some of that shows in the code: the Windows-specific file rename path in tableHelperFunctions.ts has a separate watcher close/reopen flow that looks like a quick fix rather than a principled design.

The langchain package is only used for RecursiveCharacterTextSplitter. That's a 40MB+ dependency for one utility class.

Docs are thin beyond the README. The contributing guide exists at the project site but the in-repo experience is sparse.

Bottom line

If you want a fully local PKM with actual RAG — not keyword search dressed up as AI — Reor is one of the few desktop options that pulls it off without a Python backend. The two-stage retrieval with reranking is better than most hosted tools manage.

The codebase is rough in places but the core pipeline is genuinely well-designed. Worth trying if you keep a large markdown vault and care about data locality.