A complete, forkable coding agent — auth, streaming UI, durable workflows, GitHub integration, and an isolated VM sandbox — built on Vercel infrastructure and designed to be taken apart and reassembled.
What it does
Open Agents is a reference application for running background coding agents in the cloud. You give it a GitHub repo and a prompt. It clones the repo into an isolated sandbox VM, runs a multi-step coding agent against it, and optionally commits and opens a pull request — all without keeping your laptop awake. The stack: Next.js for the web app, Vercel Workflows for durability, Vercel Sandboxes for VM execution, and the AI SDK's ToolLoopAgent as the agent runtime.
Why I starred it
Most "build your own coding agent" repos are one file of LLM calls glued to a child_process.exec. This one is a full production architecture. The reason it caught me: the agent and the sandbox are explicitly separated. The agent runs outside the VM and interacts with it through tools. That sounds obvious when you say it, but almost nobody builds it that way. The alternative — putting the agent inside the container — gives you simpler tooling but couples your model execution lifecycle to your VM lifecycle. Here, the sandbox can hibernate and resume independently while the workflow stays alive.
How it works
The agent lives in packages/agent/open-agent.ts. It's built on the AI SDK's ToolLoopAgent:
export const openAgent = new ToolLoopAgent({
model: defaultModel,
instructions: buildSystemPrompt({}),
tools,
stopWhen: stepCountIs(1),
callOptionsSchema,
prepareCall: ({ options, ...settings }) => {
// Resolves model, injects sandbox context, builds system prompt
// per-call based on working directory and branch
...
return {
...settings,
model: callModel,
experimental_context: { sandbox, skills, model, subagentModel },
};
},
});
The experimental_context pattern is doing a lot of work here. Rather than passing the sandbox down through every tool as a parameter, each tool calls getSandbox(experimental_context, "bash") and gets access to file ops, exec, and snapshot control through a uniform interface. The sandbox abstraction (packages/sandbox/interface.ts) is clean enough that swapping Vercel Sandboxes for a local Docker runtime would touch a handful of files.
The model layer in packages/agent/models.ts is a thin but thoughtful wrapper around the AI SDK gateway. It normalizes provider-specific settings by model prefix — Anthropic models get adaptive thinking at effort: "medium", GPT-5 variants get store: false plus include: ["reasoning.encrypted_content"] to avoid Responses API failures. It reads:
function supportsAdaptiveAnthropicThinking(modelId: string): boolean {
return modelId.includes("4.6") || modelId.includes("4.7");
}
The default model is anthropic/claude-opus-4.6. You can override per-call with any gateway-supported model ID.
The subagent system in packages/agent/subagents/registry.ts is sparse but intentional. There are three subagent types: explorer (read-only codebase exploration), executor (file changes and scaffolding), and design (frontend UI work). The parent agent dispatches to them via the task tool in packages/agent/tools/task.ts, which uses an async generator to stream intermediate state — tool call count, current tool name, token usage — so the UI can show elapsed time without polling:
execute: async function* ({ subagentType, task, instructions }, { ... }) {
const result = await subagent.stream({ ... });
yield { toolCallCount, startedAt, modelId: subagentModelId };
for await (const part of result.fullStream) {
if (part.type === "tool-call") {
toolCallCount += 1;
pending = { name: part.toolName, input: part.input };
yield { pending, toolCallCount, usage, startedAt, modelId };
}
}
}
The bash tool (packages/agent/tools/bash.ts) sandwiches dangerous command detection between tool call and sandbox exec. It maintains lists of regex patterns for destructive flags (rm -rf, find -delete, mkfs) and sensitive file references (.env, ~/.ssh, /proc/self/environ) and surfaces them to the needsApproval hook — meaning the UI can intercept these before they hit the VM. GitHub tokens are never embedded in sandbox git remotes; instead packages/sandbox/vercel/sandbox.ts brokers credentials via network policy injection on the Vercel Sandbox SDK.
Using it
Clone and deploy:
bun install
bun run web # dev server at localhost:3000
bun run typecheck # check all packages
bun run ci # lint + typecheck + tests + migration check
Or fork and hit "Deploy to Vercel" — Neon Postgres is auto-provisioned. You'll need a Vercel OAuth app, a GitHub App (for repo access), and BETTER_AUTH_SECRET for session signing. The .env.example has every variable documented.
Calling the agent from your own code:
import { openAgent } from "@open-agents/agent";
const result = await openAgent.run({
prompt: "Add input validation to the /api/users endpoint",
options: {
sandbox: { state, workingDirectory: "/vercel/sandbox" },
model: "anthropic/claude-opus-4.6",
customInstructions: "Follow the existing error handling patterns",
},
});
Rough edges
The repo is explicitly a reference app, not a library, and that shows in a few places:
Vercel lock-in is real. The sandbox abstraction is clean in theory, but packages/sandbox/vercel/sandbox.ts imports directly from @vercel/sandbox and relies on Vercel-specific network policy APIs for GitHub credential brokering. Running this outside Vercel means reimplementing the sandbox adapter.
No real test coverage. There's models.test.ts testing provider option merging and tools.test.ts covering path security, but the agent loop, workflow integration, and sandbox orchestration have no tests. The bun run ci command runs them, but there's not much there.
Subagent count is minimal. Three subagent types cover most cases, but the registry pattern in registry.ts is open for extension. The design subagent's "distinctive, production-grade frontend" positioning is aspirational — what that means in practice depends entirely on what you put in its system prompt.
ElevenLabs voice input is listed as a capability but the implementation is gated on an optional env var and not documented beyond that.
Recent commits (#870) hardened security findings from a DeepSec audit — input validation, Redis rate limiter fixes, blocking internal hosts in web fetch. Active maintenance on the security surface is a good sign for a project this young.
Bottom line
If you're building a coding agent and want a real starting point instead of another toy demo, fork this. The agent/sandbox separation is the right call, the subagent dispatch pattern is clean, and the model gateway abstraction saves you from writing per-provider configuration for every new model you want to try. Just go in knowing it's designed for Vercel and not easily portable away from it.
