Browserbase MCP: Cloud Browser Automation Wired Directly to LLMs

April 23, 2025

|repo-review

by Florian Narr

Browserbase MCP: Cloud Browser Automation Wired Directly to LLMs

Browser automation has always been one of those things that's straightforward in demos and gnarly in production — stale sessions, dead pages, race conditions when two calls hit the same context. mcp-server-browserbase wraps Browserbase's cloud browser infrastructure and Stagehand's LLM-native page interaction behind six MCP tools, letting any MCP-compatible agent control a real browser without dealing with any of that directly.

What it does

The server exposes six tools over STDIO or HTTP: start, end, navigate, act, observe, and extract. An LLM calls them in sequence — spin up a session, navigate to a URL, click something, pull data. Browserbase runs the actual Chrome instance in the cloud. Stagehand handles translating natural language instructions into Playwright actions.

Why I starred it

The browser-use-via-MCP space is crowded. Playwright MCP, Puppeteer wrappers, a dozen half-baked projects. What's different here is that the cloud execution model actually matters for agentic workloads. When an agent's running autonomously, you don't want a local Chrome instance tied to your machine. Browserbase provides persistent sessions, stealth mode, proxy support, and a live debugger URL you can open in your browser to watch what the agent is doing.

The other thing: Stagehand as the execution layer is a real choice. It's built on Playwright but routes page interactions through an LLM rather than CSS selectors — so act("click the login button") actually works even if the DOM changes between runs.

How it works

The entry point is src/program.ts, which uses Commander to parse CLI flags and then delegates to resolveConfig() in src/config.ts. That function does a three-layer merge: defaults, then anything from CLI options, then environment variables — with undefined-safe deep merging via pickDefined().

The core abstraction is the SessionManager class in src/sessionManager.ts. It's the most interesting file in the repo. Here's what it has to handle: MCP servers can get concurrent tool calls (the agent might call navigate while start is still resolving), and Browserbase sessions go stale if the browser dies. The manager covers both cases.

For concurrency, it uses a promise-as-mutex pattern:

// src/sessionManager.ts
if (this.defaultSessionCreationPromise) {
  return await this.defaultSessionCreationPromise;
}

this.defaultSessionCreationPromise = (async () => {
  try {
    this.defaultBrowserSession = await this.createNewBrowserSession(sessionId, config);
    return this.defaultBrowserSession;
  } finally {
    this.defaultSessionCreationPromise = null;
  }
})();

If two calls hit ensureDefaultSessionInternal at the same time, the second one waits on the same promise instead of spinning up a duplicate session. Simple, correct.

For stale sessions, it validates by checking stagehand.context.pages() before returning a session. If that throws or returns empty, it closes the dead session and recreates it — including one automatic retry before failing hard.

The Context class in src/context.ts sits above SessionManager and is what tools actually interact with. Worth noting: currentSessionId is a getter that delegates directly to sessionManager.getActiveSessionId(), avoiding the desync problem you'd get if you cached the value on the context object.

The six tool implementations in src/tools/ are deliberately thin. Here's act.ts in full:

async function handleAct(context: Context, params: ActInput): Promise<ToolResult> {
  const action = async (): Promise<ToolActionResult> => {
    const stagehand = await context.getStagehand();
    const result = await stagehand.act(params.action);
    return { content: [{ type: "text", text: JSON.stringify({ success: true, data: result }) }] };
  };
  return { action, waitForNetwork: false };
}

All the browser complexity lives in Stagehand. The tools are just the MCP protocol glue.

Transport is handled in src/transport.ts and supports two modes: STDIO (one server per process, connects directly) and Streamable HTTP (per-session MCP-Session-ID headers, Map<string, StreamableHTTPServerTransport> in memory). The HTTP mode creates a new MCP server per POST — so you can have multiple agents hitting the same running process with isolated sessions.

Using it

STDIO via npx, three env vars:

{
  "mcpServers": {
    "browserbase": {
      "command": "npx",
      "args": ["@browserbasehq/mcp"],
      "env": {
        "BROWSERBASE_API_KEY": "...",
        "BROWSERBASE_PROJECT_ID": "...",
        "GEMINI_API_KEY": "..."
      }
    }
  }
}

To swap Gemini for Claude:

{
  "args": [
    "@browserbasehq/mcp",
    "--modelName", "anthropic/claude-sonnet-4-5",
    "--modelApiKey", "sk-ant-..."
  ]
}

The --keepAlive flag keeps the Browserbase session alive between MCP connections, useful if you want to resume where you left off. --contextId lets you pass a persistent browser context so cookies and localStorage survive across runs.

Rough edges

Test coverage is minimal — there's one smoke test in tests/smoke.test.ts that spawns the server via STDIO and checks that all six tools exist. It explicitly passes fake credentials and never exercises actual browser behavior. Unit tests in src/tools/__tests__/ cover schema validation, not execution. Nothing that runs against a real session.

The in-memory session map in HTTP mode (Map<string, StreamableHTTPServerTransport>) doesn't survive restarts. If you're running this as a long-lived server behind a load balancer or with multiple replicas, session routing will break. The README acknowledges this by recommending the hosted version, but it's worth being explicit: this is not production-grade HTTP server infrastructure.

Config resolution has a confusing edge: if you don't set BROWSERBASE_API_KEY, it falls back to the string "dummy-browserbase-api-key" and logs a warning. It doesn't refuse to start. You get a runtime error from Browserbase later, which makes debugging less obvious than a hard fail at startup would.

The git history shows the repo went through a breaking refactor in early 2025 (commit [STG-1681]) that removed several tools — screenshot, get_url, agent — and renamed the npm package from @browserbasehq/mcp-server-browserbase to @browserbasehq/mcp. If you pinned the old package, the new one is the right target.

Bottom line

If you're wiring an LLM agent to a cloud browser and want something that handles session lifecycle correctly, this is a cleaner starting point than rolling your own Playwright MCP wrapper. The self-hosted path makes sense if you need model choice flexibility or want to audit what the browser is doing — for everyone else, the hosted version at https://mcp.browserbase.com/mcp is the faster path.

browserbase/mcp-server-browserbase on GitHub
browserbase/mcp-server-browserbase