Inbox Zero: Open Source AI Email Assistant Built on Plain-English Rules

What it does

Inbox Zero is a Next.js application that sits on top of Gmail (and Outlook) and enforces natural-language rules on your inbox using an LLM. You write rules like "archive newsletters unless I opened the last 3" or "draft a reply in my tone for anything from my team," and the pipeline matches incoming emails against those rules and executes the corresponding actions automatically.

Why I starred it

Most email automation tools give you filters: from-address matches X, apply label Y. That covers about 20% of the cases where you actually want automation. The interesting 80% require understanding what the email is — is this a cold outreach? Is this a reply that needs a response? Is this newsletter one I actually read?

Inbox Zero's bet is that LLMs are now good enough to evaluate that, reliably enough to trust with automated actions. The architecture around that bet is what caught my attention. It isn't just "call OpenAI on every email." There is a layered matching pipeline: regex and header checks first, AI only when static rules cannot resolve the match, with explicit feedback loops so the model learns from corrections.

The self-hosting path is clean. npx @inbox-zero/cli setup handles the Docker orchestration. For a Gmail-connected productivity tool, that is not trivial.

How it works

The core of the system lives in apps/web/utils/ai/choose-rule/. When an email arrives, run-rules.ts orchestrates the pipeline:

Static matching first. findMatchingRules in match-rules.ts checks rules with ConditionType.FROM, ConditionType.SUBJECT, and ConditionType.BODY fields against the message without touching the LLM. Header-based cold-email detection runs here too — if the message lacks a List-Unsubscribe header and the sender has never gotten a reply, isColdEmail evaluates it against a cold-email-specific prompt.
AI matching for the rest. Rules that are pure natural language go to aiChooseRule in ai-choose-rule.ts. The model receives an XML-wrapped prompt with your user info, all your rules, and the email content. It returns a list of matched rule names, a primary rule designation, and a reasoning field. The promptHardening: { trust: "untrusted", level: "full" } flag is notable — this is prompt injection hardening because the email body is untrusted user input that could try to hijack the rule selection.
Multi-rule vs. single-rule paths. When multiRuleSelectionEnabled is set and you have custom rules, the model runs a multi-rule selection path (getAiResponseMultiRule) that can match several rules to the same email — useful for emails that need to be labelled, tracked for reply, and forwarded simultaneously.
Conversation tracking as a meta-rule. There is a hardcoded meta-rule in run-rules.ts called CONVERSATION_TRACKING_INSTRUCTIONS with a precise exclusion list: LinkedIn, GitHub, Slack, Figma, Jira, Facebook, social platforms, calendar invites, any email with a List-Unsubscribe header. This runs separately from your rules and routes human-to-human emails into the reply tracker.

// From apps/web/utils/ai/choose-rule/ai-choose-rule.ts
const generateObject = createGenerateObject({
  emailAccount,
  label: "Choose rule",
  modelOptions,
  promptHardening: { trust: "untrusted", level: "full" },
});

The action execution in execute.ts is straightforward: it iterates executedRule.actionItems and dispatches to runActionFunction in actions.ts, which is a switch over ActionType values — ARCHIVE, LABEL, DRAFT_EMAIL, REPLY, FORWARD, NOTIFY_MESSAGING_CHANNEL, and more. Draft IDs are tracked in the database so you can see what the AI pre-drafted before sending.

The model layer in utils/llms/model.ts supports OpenAI, Anthropic, Azure, Amazon Bedrock, Google Vertex, Groq, OpenRouter, and Ollama — all through the Vercel AI SDK. You can bring your own API key, which routes to the default model selection. Without one, the app uses its own hosted models with economy/nano variants for lighter classification tasks.

The feedback loop is worth calling out. In ai-choose-rule.ts, formatClassificationFeedback builds a <classification_feedback> XML block from past manual corrections — emails you moved into or out of rules. That gets injected into the prompt. It is a simple implementation, but it is the right instinct: the model should know when it previously misclassified something from the same sender.

Using it

Self-hosted setup:

npx @inbox-zero/cli setup   # Interactive wizard: OAuth, DB, Redis
npx @inbox-zero/cli start   # Starts Postgres, Redis, and the Next.js app
open http://localhost:3000

You can also run the dev stack directly:

git clone https://github.com/elie222/inbox-zero.git
cd inbox-zero
docker compose -f docker-compose.dev.yml up -d  # Postgres + Redis
pnpm install && npm run setup
cd apps/web && pnpm prisma migrate dev && cd ../..
pnpm dev

Rules are set through the UI. You write natural language criteria: "newsletters I haven't opened in the last 30 days" — and the system will label it, archive it, or unsubscribe on your behalf. The bulk unsubscriber is a separate flow that scans for senders with unsubscribe links and lets you one-click unsubscribe + archive without writing any rules.

Rough edges

The project has evolved fast, and that shows in the codebase. The ARCHITECTURE.md was generated by Gemini — it is accurate but thin, and the actual nuance of how conversation tracking interplays with custom rules is only discoverable by reading run-rules.ts carefully.

Gmail OAuth setup for self-hosting requires creating a Google Cloud project with appropriate scopes, which is non-trivial and not fully documented in the repo itself. The docs site fills some of that in, but the steps for production OAuth verification are notably absent.

The Turborepo monorepo structure (apps/web, plus packages/tinybird, packages/resend, etc.) means you are pulling in more infrastructure than a simple self-hosted productivity tool might want. Tinybird for analytics and Loops for marketing emails are baked into the package graph. None of that blocks running the app, but if you are self-hosting and want a minimal footprint, some of those integrations will just sit unused.

Test coverage is mixed. There are Vitest unit tests for rule matching and argument generation (match-rules.test.ts, ai-choose-args.test.ts), and Playwright smoke tests for end-to-end flows. The AI classification path has eval tooling in qa/, which is the right approach for testing LLM pipelines. But coverage is not comprehensive — the bulk archiver and unsubscriber flows do not have unit tests.

Bottom line

If you process enough email that manual filtering has become a maintenance burden, this is worth spinning up. The rule evaluation architecture is solid, the self-hosting path is functional, and the model layer is flexible enough that you are not locked into a single provider.