deepsec: agent-powered vulnerability scanner for large codebases

deepsec: agent-powered vulnerability scanner for large codebases

deepsec is a security scanner that breaks the work into two clearly separated stages: fast regex-based candidate discovery, then AI agent investigation. The scanner runs across your whole codebase for free; the AI only sees files that already look suspicious.

Why I starred it

Most static analysis tools give you one of two things: a flood of low-confidence findings from pattern matching, or expensive AI review that burns budget on every file regardless of relevance. deepsec sequences them — regex first, AI second — and the architecture makes that split explicit in the CLI commands, the on-disk schema, and the code.

What caught my eye was the env allowlist in packages/processor/src/agents/claude-agent-sdk.ts. When the agent spawns Claude Code to investigate files, it constructs a minimal environment from a hardcoded allowlist of safe variables and only the four credential keys the SDK actually needs. The GITHUB_TOKEN, AWS_*, and everything else in your shell is dropped before the subprocess sees anything. That's not the default; someone thought about what a compromised prompt-injection attack looks like when your source code contains attacker-controlled content.

How it works

The four packages map cleanly onto the pipeline:

  • @deepsec/scanner — regex matchers, tech detection, file record persistence
  • @deepsec/processor — AI agent orchestration, prompt assembly, batch management
  • @deepsec/core — shared schemas, file record I/O, plugin registry
  • deepsec — CLI commands that wire everything together

The scanner

I opened packages/scanner/src/index.ts and traced the scan path. The RegexScannerDriver pre-globs all unique file patterns at once to avoid running the same glob 80 times for 80 matchers, caches file contents in a Map<string, string>, and normalizes CRLF line endings before any regex runs. The output is a set of FileRecord JSON files written to data/<projectId>/files/. Each record accumulates candidates additively — re-scanning merges new hits rather than replacing them.

The matcher gate system is the subtle part. Each matcher declares a requires clause with either a tech tag (detected from lockfiles and manifests) or a sentinelFiles glob. evaluateGate() in scanner/src/index.ts checks these at scan time — matchers for Laravel, Apex, or Crystal only run when those are actually present. Gates are unions, not intersections: tech OR sentinel passing is enough.

The cache-key-poisoning matcher in packages/scanner/src/matchers/cache-key-poisoning.ts is a good example of the pattern. It's tagged noiseTier: "precise" and fires on specific combinations like cache.key.host or redis.set with request-derived arguments, with explicit examples in the source to document what each regex is targeting:

{
  regex: /cache.{0,80}key.{0,80}host|cache_key.{0,80}host/i,
  label: "Cache key includes Host header",
},
{
  regex: /redis\.\w*set\s*\([^)]{0,200}req\.|redis\.\w*set\s*\([^)]{0,200}host/i,
  label: "Redis set with request-derived key",
},

The processor

packages/processor/src/agents/claude-agent-sdk.ts spawns Claude Code via @anthropic-ai/claude-agent-sdk with effort: "max" and thinking: { type: "adaptive" }. The agent gets ["Read", "Glob", "Grep", "Bash"] tools and permissionMode: "dontAsk", so it can read source files without prompting. Batches of files are investigated concurrently; a QuotaExhaustedError aborts all in-flight batches immediately via AbortController rather than letting them drain.

The prompt in packages/processor/src/prompt/core.ts is sharp — it includes detailed false-positive guidance, an explicit table of known vulnerability slugs, and a list of auth bypass patterns to look for beyond just "missing auth check". The system deliberately avoids the CORE_PROMPT growing bloated by routing framework-specific context into per-tech highlights that only get injected when that tech is detected.

After investigation, revalidate re-reads each finding against git history and emits a verdict: true-positive, false-positive, fixed, or uncertain. The docs claim this cuts false positive rate by 50%+ empirically.

The sandbox

For large repos, deepsec sandbox process fans the work across Vercel Sandbox microVMs. The orchestrator in packages/deepsec/src/sandbox/orchestrator.ts tarballs your repo (excluding .git), partitions pending FileRecord batches, spawns one sandbox per partition, and merges results back via download.ts. The repo tarball and the .deepsec/ workspace tarball ship separately — the workspace gets re-installed from npm inside each VM, so the AI credential keys are injected via Vercel Sandbox environment rather than included in the tarball at all.

Using it

The typical flow inside .deepsec/:

# Scan the target repo  fast, no AI cost
pnpm deepsec scan --project-id my-app

# Inspect what the scanner found
pnpm deepsec status

# Investigate candidates with the AI agent
pnpm deepsec process --project-id my-app --concurrency 5 --batch-size 5

# Re-check for false positives
pnpm deepsec revalidate

# Export findings as markdown files
pnpm deepsec export --format md-dir --out ./findings

For PR review, process --diff scans only files changed in a git diff, writes a FileRecord even for files with no regex hits, and lets the AI investigate them holistically. The IGNORE_DIRS list in scanner/src/index.ts is exported so the diff mode can apply the same filter:

export const IGNORE_DIRS = [
  "**/node_modules/**",
  "**/.git/**",
  "**/__tests__/**",
  "**/*.test.{ts,tsx,js,jsx}",
  // ...
];

Tests are excluded from AI review by default. That's a deliberate call — dist/ and test files burn budget investigating code that's not production.

Rough edges

The documentation warns upfront that full scans on large codebases cost thousands of dollars. That's honest, but the tooling for previewing cost before running is thin — status shows pending file counts, but there's no dry-run estimate of what a process pass will spend before you commit.

The initial init flow is odd: you run npx deepsec init, then open a coding agent and paste a prompt to have it read SKILL.md and populate INFO.md. This works, but it's more ceremony than a first-run wizard would need. The resulting INFO.md — a 50-100 line project summary injected into every AI batch — is central to scan quality, so it's not a step you'd want to skip.

There's no public SaaS version. This runs on your own infrastructure against your own AI gateway keys. That's actually the right call for a security tool you're pointing at your source code, but it's worth noting for teams that want a fully managed option.

The test suite in packages/deepsec/src/__tests__/ is thorough on the credential-brokering, sandbox merge, and formatter logic — but scanner matcher coverage and end-to-end process flows are mostly covered by e2e tests that require live API credentials to run.

Bottom line

deepsec is the right tool if you're running a serious codebase review where you want to go beyond semgrep patterns and have AI actually read the code in context. The two-stage design keeps costs controlled and the security model around the spawned agent subprocess is more thought-through than most tools in this space.

vercel-labs/deepsec on GitHub
vercel-labs/deepsec