Genkit: Google's Open-Source AI App Framework

Genkit is an open-source framework for building production AI applications in TypeScript/JavaScript, Go, and Python. It's built and maintained by the Firebase team at Google, and it ships as a collection of packages that give you a unified interface across model providers, a plugin system for tools and retrievers, and a local dev UI for debugging execution traces.

Why I starred it

The AI framework space is crowded. LangChain, LlamaIndex, Vercel AI SDK — there are options. What pushed me toward Genkit was the architecture. Most alternatives either wrap the API too thin (just streaming fetch calls) or too thick (magic chains you can't inspect). Genkit threads the needle with an action registry, a real middleware pipeline, and a developer UI that shows you exactly what happened in each tool call without requiring you to add logging.

The fact that Firebase uses it in production and ships regular releases (they're at v1.32.0 at the time I starred this, with a 1.33-rc a week out) also matters. This isn't an experimental repo.

How it works

The core abstraction is an action — a typed, observable, streamable function registered in a central Registry. Flows, models, tools, retrievers, embedders — everything is an action. When you call ai.defineFlow(), you're registering an action of type 'flow'. When you call ai.defineTool(), you get an action of type 'tool'. The registry (js/core/src/registry.ts) holds 14 distinct action types and resolves them by string key.

The generate loop lives in js/ai/src/generate/action.ts. It's worth reading in full. The core generate() function handles:

Parameter resolution (model ref → model action, tool names → tool actions)
Middleware dispatch — each model call runs through a chain of middleware before hitting the actual model action
Tool request execution — after the model responds, it scans the message for toolRequest parts and runs them
Agentic looping — it recurses with currentTurn + 1 until no tool requests remain, or until maxTurns (default 5) is hit

The tool loop section from generate():

const maxIterations = rawRequest.maxTurns ?? 5;
if (currentTurn + 1 > maxIterations) {
  throw new GenerationResponseError(
    response,
    `Exceeded maximum tool call iterations (${maxIterations})`,
    'ABORTED',
    { request }
  );
}

const { revisedModelMessage, toolMessage, transferPreamble } =
  await resolveToolRequests(registry, rawRequest, generatedMessage);

// if an interrupt message is returned, stop the tool loop and return a response
if (revisedModelMessage) {
  return {
    ...response.toJSON(),
    finishReason: 'interrupted',
    finishMessage: 'One or more tool calls resulted in interrupts.',
    message: revisedModelMessage,
  };
}

The interrupt mechanism is a first-class concept here. Tools can throw a ToolInterruptError to pause execution and surface a pending approval back to the caller — no polling, no state machine you wire yourself. The tool.respond() and tool.restart() helpers on ToolAction (js/ai/src/tool.ts) let you resume from where the interrupt fired with a typed response.

Structured output is handled through a format system. resolveFormat() looks up a registered formatter (JSON, text, media, etc.), then injectInstructions() appends schema-specific instructions to the prompt before it hits the model. The schema itself flows through as either a Zod type or raw JSON Schema via toJsonSchema(), so you're not locked into Zod if you already have schemas elsewhere.

The plugin interface (js/genkit/src/plugin.ts) is clean: a plugin is an object with an optional initializer that returns actions and models, and an optional resolver for lazy-loading. The new plugin v2 format switches from registration at init time to a resolver callback, which means plugin actions don't get loaded until something actually requests them — useful for large tool suites.

Using it

Install and wire up a model:

npm install genkit @genkit-ai/google-genai

import { genkit } from 'genkit';
import { googleAI } from '@genkit-ai/google-genai';

const ai = genkit({ plugins: [googleAI()] });

// Define a typed tool
const getWeather = ai.defineTool(
  {
    name: 'getWeather',
    description: 'Get current weather for a city',
    inputSchema: z.object({ city: z.string() }),
    outputSchema: z.object({ temp: z.number(), condition: z.string() }),
  },
  async ({ city }) => ({ temp: 22, condition: 'sunny' })
);

// Run a flow with tool calling
const weatherFlow = ai.defineFlow('weather', async (city: string) => {
  const { text } = await ai.generate({
    model: googleAI.model('gemini-2.5-flash'),
    tools: [getWeather],
    prompt: `What's the weather like in ${city}?`,
  });
  return text;
});

Start the dev UI with genkit start -- tsx src/index.ts. It runs a local reflection server that the UI polls, giving you a playground for every registered flow and tool, plus full execution traces with per-step latency.

The MCP plugin (@genkit-ai/mcp) is worth a mention. It wraps any MCP server as a Genkit tool provider — so if you already have MCP servers, they slot straight into the same tool-calling pipeline without any adaptation code.

Rough edges

The Python SDK is still alpha and feels it. API surface is smaller, fewer plugins available, and the docs are behind the JS docs.

The dotprompt format (.prompt files with Handlebars-style templating) is useful for teams who want prompts out of code, but the syntax is non-standard and the docs don't explain well how it interacts with Zod schemas. I had to read js/ai/src/prompt.ts directly to understand how isExecutablePrompt() decides whether to treat an argument as a prompt action or a raw string.

Firebase-adjacent deployment is clearly the first-class path. Cloud Run and Cloud Functions get first-party guides. Deploying to a plain Node.js server or a non-Google cloud works fine — the @genkit-ai/express plugin handles HTTP — but the monitoring dashboards only connect to Firebase or Google Cloud. If you're on AWS or self-hosted, you get OpenTelemetry traces that you can wire to whatever backend you prefer, but the turnkey observability story is Google-only.

No built-in rate limiting or retry logic in the generate loop. If your model provider returns a 429, it throws. You're expected to handle that in middleware, but there's no reference middleware implementation showing how.

Bottom line

If you're building TypeScript AI apps and want something that's production-tested, actively maintained, and gives you real observability without requiring you to bolt on five separate libraries, Genkit is the most complete option in this space. The interrupt/resume pattern for human-in-the-loop flows is particularly well-thought-out — most frameworks treat that as an afterthought.