Metabase: 46k Stars of Clojure-Powered BI That Actually Works

What it does

Metabase is an open source BI tool that lets non-technical people query databases through a visual interface, while giving engineers a SQL editor and an API they can embed anywhere. It connects to Postgres, MySQL, BigQuery, and dozens of other databases through a pluggable driver system.

Why I starred it

Most BI tools are either too simple to be useful or so complex you need a consultant. Metabase hits a narrow sweet spot: your marketing team can build dashboards without writing SQL, but the underlying architecture is serious engineering. The codebase is Clojure on the backend, React on the frontend, and the query processor pipeline is one of the more interesting pieces of middleware composition I've seen in a production system.

The recent addition of an Agent API caught my eye too. They built first-class support for LLMs to query your data, with hard row limits designed to fit context windows.

How it works

The heart of Metabase is the query processor in src/metabase/query_processor.clj. Every query — whether it comes from the visual builder, the SQL editor, or the API — flows through a middleware pipeline that preprocesses, compiles, executes, and post-processes results.

The main entry point builds the pipeline by reducing middleware functions around a core execution function:

(defn- rebuild-process-query-fn! []
  (alter-var-root #'process-query* (constantly
                                    (reduce
                                     (fn [qp middleware]
                                       (if middleware
                                         (middleware qp)
                                         qp))
                                     process-query**
                                     around-middleware))))

Each middleware wraps the next, forming a bidirectional stack. The comments in the source are explicit about this: preprocessing happens bottom-to-top, post-processing top-to-bottom. Like throwing a ball up and catching it on the way down.

The preprocessing pipeline in src/metabase/query_processor/preprocess.clj has 30+ middleware steps — parameter substitution, source table resolution, automatic datetime bucketing, permission sandboxing, join resolution. Each step transforms the MBQL query into a more refined form before compilation hits the driver layer.

What makes this elegant is the cancellation system in src/metabase/query_processor/pipeline.clj. They use core.async channels as cancellation tokens:

(def ^:dynamic ^clojure.core.async.impl.channels.ManyToManyChannel *canceled-chan*
  nil)

(defn canceled? []
  (some-> *canceled-chan* a/poll!))

Before executing against the database, and again before reducing results, the pipeline checks this channel. If the HTTP connection drops, the query bails immediately. Simple, composable, no special exception handling needed.

The driver system in src/metabase/driver.clj uses Clojure's multimethod dispatch with a custom hierarchy. Drivers register themselves, and the hierarchy lets SQL-JDBC drivers inherit from :sql, which inherits from the base :metabase.driver/driver. A ReentrantReadWriteLock in src/metabase/driver/impl.clj prevents race conditions during lazy driver loading — threads can read concurrently but block when a driver is being initialized:

(defonce ^:private ^ReentrantReadWriteLock load-driver-lock
  (ReentrantReadWriteLock.))

This is the kind of detail you only add after hitting a real concurrency bug in production. Issue #13114 is referenced directly in the comments.

The new Agent API in src/metabase/agent_api/api.clj is purpose-built for LLM integrations. It caps query results at 200 rows (max-query-row-limit) and supports pagination via continuation tokens — designed for context windows rather than human consumption. Entities are exposed as typed schemas with snake_case JSON encoding, and tool results follow a structured output pattern that maps cleanly to function-calling interfaces.

Using it

The quickest way to try it:

docker run -d -p 3000:3000 --name metabase metabase/metabase

Five minutes later you have a running instance at localhost:3000. Connect a database, and the visual query builder lets you drag and drop to build queries without SQL. For engineers, the native SQL editor supports template variables:

SELECT * FROM orders
WHERE created_at > {{start_date}}
  AND status = {{status}}

These become filter widgets on the dashboard. The API is straightforward:

curl -X POST http://localhost:3000/api/dataset \
  -H "X-Metabase-Session: $SESSION" \
  -d '{"database": 1, "type": "query", "query": {"source-table": 2}}'

For embedding, you can drop entire dashboards into your app with an iframe or use their React SDK for component-level integration.

Rough edges

The codebase is massive. The src/metabase directory has 80+ top-level namespaces and the query processor middleware alone has 30+ files. Contributing means understanding both Clojure and a substantial custom MBQL query language. The learning curve is steep even for experienced developers.

The Clojure/ClojureScript split means two build systems, two dependency managers, and two sets of tooling. Frontend is React with a large legacy codebase that's been incrementally modernized over years.

Self-hosting at scale requires tuning the JVM. The default Docker image works for small teams, but once you're past a few dozen concurrent users, you're configuring heap sizes, connection pools, and likely putting it behind a reverse proxy.

The commercial features (embedding, sandboxing, audit logs) live in the same repo but under a different license. It's clear what's AGPL and what's commercial, but you need to read the license files carefully if you're building on top of it.

Bottom line

Metabase is the BI tool I'd recommend to any team that needs dashboards and doesn't want to pay Looker prices. The Clojure middleware architecture is genuinely well-engineered, and the new Agent API makes it a strong foundation for AI-powered data querying. Best suited for teams of 5-200 who need self-serve analytics without a dedicated data engineering team.