Vanna: Text-to-SQL via RAG, Now with a Full Agent Runtime

Vanna converts natural language questions into SQL queries. You feed it DDL, documentation, and example question/SQL pairs as training data; it retrieves the most relevant context via embeddings and passes that context to an LLM to generate the query.

Why I starred it

Most text-to-SQL tools just prompt the LLM with the schema and hope for the best. Vanna's core idea is different: it maintains a vector store of your training examples and retrieves similar past queries alongside the DDL before generating SQL. The result is that query quality improves as you add corrections — it learns your specific naming conventions, common joins, and business logic. That feedback loop is the part worth understanding.

Version 2.0 also added something I hadn't seen other projects wire up this cleanly: a user-aware agent runtime where identity flows from the HTTP request into every tool execution and row-level security filter, not just the authentication layer.

How it works

The original architecture lives in src/vanna/legacy/base/base.py. The generate_sql method chains five calls:

def generate_sql(self, question: str, allow_llm_to_see_data=False, **kwargs) -> str:
    # 1. get_similar_question_sql   — vector search over past Q/SQL pairs
    # 2. get_related_ddl            — vector search over schema definitions
    # 3. get_related_documentation  — vector search over free-form docs
    # 4. get_sql_prompt             — assembles context into a prompt
    # 5. submit_prompt              — sends to LLM, returns SQL string

Each step is an abstract method, making the whole thing composable. You mix-and-match LLM backends (OpenAI, Anthropic, Ollama, Gemini, Bedrock) with vector stores (ChromaDB, Pinecone, Qdrant, pgvector, FAISS) by subclassing. There are 25+ integrations in src/vanna/legacy/.

The 2.0 rewrite in src/vanna/core/ keeps that retrieval idea but restructures around an Agent class with a ToolRegistry. Tools are typed with Pydantic generics:

class Tool(ABC, Generic[T]):
    @property
    def access_groups(self) -> List[str]:
        return []  # empty = accessible to all

    @abstractmethod
    def get_args_schema(self) -> Type[T]: ...

    @abstractmethod
    async def execute(self, context: ToolContext, args: T) -> ToolResult: ...

    def get_schema(self) -> ToolSchema:
        # auto-generates JSON schema from Pydantic model for LLM tool-calling
        schema = args_model.model_json_schema()
        return ToolSchema(name=self.name, description=self.description,
                          parameters=schema, access_groups=self.access_groups)

The access_groups property is where permissions enter. When you register RunSqlTool with access_groups=["read_sales"], the ToolRegistry.get_schemas(user) call filters tools based on the user's group_memberships. The LLM never sees tools the user can't execute — it's not a runtime check after the fact, it's excluded from the tool list entirely.

The RunSqlTool in src/vanna/tools/run_sql.py does something subtle with result handling: for large SELECT results it truncates the CSV preview to 1000 characters and appends an all-caps instruction to the LLM to skip summarizing and call VISUALIZE_DATA instead. A bit of prompt engineering baked into the tool return value, which keeps the agent from generating verbose text about data it can't fully see.

The LegacyVannaAdapter in src/vanna/legacy/adapter.py wraps old VannaBase instances as a ToolRegistry, so 0.x setups get the new web UI without a full migration. The migration path is pragmatic: it bridges both worlds rather than forcing a rewrite.

Using it

The minimal FastAPI setup:

from vanna import Agent
from vanna.servers.fastapi.routes import register_chat_routes
from vanna.integrations.anthropic import AnthropicLlmService
from vanna.tools import RunSqlTool
from vanna.integrations.sqlite import SqliteRunner
from vanna.core.registry import ToolRegistry

llm = AnthropicLlmService(model="claude-sonnet-4-5")
tools = ToolRegistry()
tools.register(RunSqlTool(sql_runner=SqliteRunner("./data.db")))

agent = Agent(llm_service=llm, tool_registry=tools)
register_chat_routes(app, agent)

Then drop the web component anywhere:

<script src="https://img.vanna.ai/vanna-components.js"></script>
<vanna-chat sse-endpoint="/api/vanna/v2/chat_sse" theme="dark"></vanna-chat>

The SSE endpoint streams structured UI components — tables, charts, status updates — not just text. The web component renders them.

For the legacy RAG-based workflow, training still works via:

vn.train(ddl="CREATE TABLE orders (...)")
vn.train(question="What are monthly sales?", sql="SELECT ...")
vn.ask("What are the top customers by revenue?")

Rough edges

The 2.0 README is a complete rewrite with no mention of how the RAG retrieval integrates into the new agent architecture. It's unclear whether AgentMemory in src/vanna/capabilities/agent_memory.py replaces the vector store retrieval or sits alongside it. The original VannaBase.generate_sql pipeline is entirely absent from the new docs.

Test coverage is uneven. The tests/ directory has 14 files, but most target specific integrations (ChromaDB, Gemini, Ollama) rather than the core agent loop. test_agents.py and test_workflow.py exist but aren't comprehensive end-to-end tests.

The pyproject.toml [all] extras list is 20+ packages including heavy ML dependencies like transformers and pymilvus[model]. The base install is lean (pydantic, pandas, httpx), but newcomers who pip install vanna[all] will wait a while.

The v2 branch exists separately from main in the repo — as of stargazing, the 2.0 README is on main but the underlying code structure suggests active migration work is still ongoing.

Bottom line

If you're building a data Q&A interface and want something with real permission semantics rather than a thin wrapper around an LLM prompt, Vanna 2.0's tool-registry approach is worth the integration cost. The legacy 0.x path still works for quick prototypes where you just want vn.ask() against an existing database.