X Bookmarks — 2024 KW13: Advanced RAG and an analytics tool going open source

March 28, 2024

|bookmarks

by Florian Narr

X Bookmarks — 2024 KW13: Advanced RAG and an analytics tool going open source

@LangChain — Advanced RAG series: generation and evaluation

👨‍🎓Advanced RAG Series: Generation and Evaluation

In the fifth part of this series, we look at techniques for:

🔊 Generation (CRAG, Self-RAG, RRR) 🧮Evaluation (RAGAs, Langsmith, DeepEval)

Another awesome blog by @divyanshu_van

Smart, because most RAG content stops at retrieval quality and ignores that your generation step can actively compensate for bad context. CRAG (Corrective RAG) and Self-RAG are the interesting ones here — CRAG grades retrieved docs and falls back to web search if they score too low, while Self-RAG adds reflection tokens to control when retrieval even happens. The evaluation half is equally useful: RAGAs gives you quantitative metrics (faithfulness, answer relevance, context recall) that you can track across iterations instead of eyeballing outputs.


@CarlLindesvard — OpenPanel open-sourcing if the tweet hits 50 likes

I'll release openpanel.dev source code today if this gets 50 likes 🥰

Let's see if we go public today or next week 🥳

#buildinpublic #opensource

Honestly a fun way to do a launch moment — turn the open source decision into a community milestone rather than just a changelog entry. openpanel.dev is a Mixpanel/Plausible-style product analytics tool. It got the likes. The repo is now public. Worth watching if you're building something and want a self-hostable alternative to Mixpanel that isn't Matomo.


@DataScienceDojo — Choosing vector embeddings

Think of turning words into a special code that helps your AI understand what you want. That's what vector embeddings do!

This post breaks down the key choices you need to make, like what kind of task you're tackling (think: finding similar products or understanding feelings in [text])

The framing is very beginner-level, but the underlying point is valid and often skipped: embedding model selection is task-specific. Semantic similarity, classification, and clustering each reward different model choices, and most tutorials just throw text-embedding-ada-002 at everything. Saved this as a reference to share when someone on the team asks why their RAG retrieval quality is flat.