codemap: a project brain for your AI agents

codemap is a Go CLI that gives AI coding agents structural awareness of your codebase — hub files, diff summaries, dependency graphs, intent classification — packaged as a ContextEnvelope that any LLM can consume.

Why I starred it

The problem is familiar: you open a session with Claude Code or Codex, and the first thing the agent does is read half your codebase to figure out where things live. The bigger the repo, the worse the token burn and the worse the quality of the context that actually lands in the window.

codemap doesn't solve this by making agents smarter. It solves it by front-loading the structural work: which files have 12 importers (the hubs), what changed since main, what the dependency graph looks like — all pre-computed, budgeted, and serialized into a format agents can query without doing the exploration themselves.

The hook integration is what clinched it for me. You run codemap setup once inside your project and it wires into Claude Code's settings.local.json — so at session start, before every edit, after every edit, your agent already knows the lay of the land.

How it works

The entry point is main.go, which dispatches to subcommands before flag parsing for things like watch, hook, and handoff. The core architectural output is a ContextEnvelope defined in cmd/context.go:

type ContextEnvelope struct {
    Version     int                `json:"version"`
    GeneratedAt time.Time          `json:"generated_at"`
    Project     ProjectContext     `json:"project"`
    Intent      *TaskIntent        `json:"intent,omitempty"`
    WorkingSet  *WorkingSetContext `json:"working_set,omitempty"`
    Skills      []SkillRef         `json:"skills,omitempty"`
    Handoff     *HandoffRef        `json:"handoff,omitempty"`
    Budget      BudgetInfo         `json:"budget"`
}

That Budget field is where the design gets interesting. The limits package in limits/limits.go enforces hard caps: MaxContextOutputBytes = 60000 (~15k tokens, under 10% of a 200k context window). Tree depth scales with repo size via AdaptiveDepth() — large repos (>5000 files) get depth 2, medium repos depth 3, small repos depth 4. This is the kind of decision most tools leave as an exercise for the user.

The intent classification in cmd/intent.go is a weighted keyword scorer — no ML, just categoryDefs with phrases and weights:

var categoryDefs = []categoryDef{
    {
        Category: "explore",
        Signals: []intentSignal{
            {"how does", 5}, {"where is", 5}, {"what uses", 5}, {"what is", 4},
            {"show me", 4}, {"walk me through", 5},
        },
    },
    {
        Category: "refactor",
        Signals: []intentSignal{
            {"refactor", 5}, {"rename", 4}, {"restructure", 5},
            {"clean up", 4}, {"extract", 3}, {"split", 3},
        },
    },
    // ...
}

Simple, fast, zero dependencies. For the prompt-submit hook this fires on every message, so keeping it synchronous matters.

The multi-agent handoff in handoff/ is the most architecturally interesting part. The Artifact struct (handoff/types.go) splits context into a PrefixSnapshot (stable hub summaries, file count — changes rarely) and a DeltaSnapshot (changed files, risk files, recent events — changes per session). Both get SHA-256 hashed independently, and the CacheMetrics field tracks the reuse ratio across handoff saves:

type CacheMetrics struct {
    PrefixBytes          int     `json:"prefix_bytes"`
    DeltaBytes           int     `json:"delta_bytes"`
    UnchangedBytes       int     `json:"unchanged_bytes"`
    ReuseRatio           float64 `json:"reuse_ratio"`
    PrefixReused         bool    `json:"prefix_reused"`
    DeltaReused          bool    `json:"delta_reused"`
}

If the prefix hasn't changed between agents, only the delta needs to be transmitted. This matters when you're switching from Claude to Codex mid-session — you're not re-briefing from scratch, you're sending a delta.

The hook output uses cappedStringWriter in cmd/hooks.go — a writer that stops accepting bytes past a configurable max but still returns len(p) to avoid blocking subprocess writers. Small touch, but it prevents hooks from stalling a post-edit event if the codebase output explodes.

Using it

Setup is one command from your repo root:

codemap setup

That writes .codemap/config.json and injects hooks into .claude/settings.local.json. After a session restart, every Claude Code interaction gets project context at the top.

The blast-radius subcommand is useful for pre-flight checks before a big refactor:

codemap blast-radius --ref main .

Which emits something like:

╭─────────────────────────── myproject ──────────────────────────╮
│ Changed: 4 files | +156 -23 lines vs main                      │
╰────────────────────────────────────────────────────────────────╯
├── api/
│   └── (new) auth.go         ✎ handlers.go (+45 -12)
└── ✎ main.go (+29 -3)

⚠ handlers.go is used by 3 other files

For Cursor, Windsurf, or anything that can shell out:

codemap context --for "refactor auth"

Returns a full JSON ContextEnvelope with the intent pre-classified, matched skills surfaced, and working set populated.

The codemap serve HTTP API is a nice addition for tools that can't shell out directly — bind to 127.0.0.1:9471 and hit GET /api/context?intent=refactor+auth.

Rough edges

The --deps flag requires ast-grep as a separate binary — it's not bundled unless you grab the codemap-full release artifact. The README explains this, but it's easy to miss if you install via brew and then wonder why dependency flow isn't working.

Skills are markdown files with YAML frontmatter — straightforward to write, but the matching logic is keyword-based like intent classification. There's no semantic similarity. A skill about "authentication" won't match a prompt that says "login flow" unless "login" is in its keyword list.

The community skill registry is on the roadmap but not live yet. Right now, you can only share skills by copying markdown files.

Test coverage is solid for a solo project — main_test.go builds the binary first and runs integration tests against it, and the handoff package has dedicated test files. The coverage badge links to a Gist, which suggests it's tracked but not enforced in CI.

Bottom line

If you're running Claude Code or Codex on repos larger than a few hundred files, the context overhead is real and codemap addresses it directly. The hook integration is low-friction to set up, and the handoff artifact makes multi-agent workflows less painful.