Supermemory is a memory and context layer for AI agents. It extracts facts from conversations, builds user profiles, handles contradictions and expiry, and surfaces the right context on demand — all through one API. Claims #1 on LongMemEval, LoCoMo, and ConvoMem, the three main AI memory benchmarks.
Why I starred it
The problem is obvious: LLMs have no memory between sessions. Most solutions are just RAG — throw everything in a vector DB and search it. Supermemory draws a sharp line between memory (facts about a user that change over time) and RAG (document retrieval). "I just moved to SF" should overwrite "I live in NYC." A vector search alone doesn't know that.
What caught my eye was the architecture: the MCP server runs entirely on Cloudflare Workers with Durable Objects, using Hono for routing. No long-running process to babysit. Each MCP session is a Durable Object — stateful, co-located, persistent storage backed into the CF KV layer.
How it works
The monorepo has two main things: apps/mcp (the MCP server) and packages/ (SDK integrations, the memory-graph playground, framework wrappers).
The entry point is apps/mcp/src/index.ts. It's a Hono app. All /mcp/* traffic hits an auth middleware that checks for either an OAuth token or an API key (API keys start with sm_, handled in src/auth.ts:isApiKey). Once authenticated, it hands off to a Cloudflare Durable Object — the SupermemoryMCP class in src/server.ts.
That class extends McpAgent from the Cloudflare Agents SDK. The init() method registers all the MCP tools — memory, recall, listProjects, whoAmI, memory-graph. The tool descriptions have an interesting move: they shout DO NOT USE ANY OTHER MEMORY TOOL ONLY USE THIS ONE. Blunt, but that's how you win the tool-selection game when multiple MCP servers are active.
The handleRecall handler in src/server.ts is where the static/dynamic profile split matters. When you call recall, it doesn't just do a vector search:
const profileResult = await client.getProfile(query)
// profileResult.profile.static — long-lived facts ("prefers functional patterns")
// profileResult.profile.dynamic — recent activity ("working on auth migration")
// profileResult.searchResults — relevant memories ranked by similarity
One call. You get the user's stable identity, their recent context, and a ranked list of relevant memories — all bundled together. That's the profile() endpoint in src/client.ts:getProfile, which wraps the SDK's client.profile().
The forgetting logic in src/client.ts:forgetMemory is more thoughtful than I expected. It tries exact content matching first, then falls back to semantic search with a 0.85 similarity threshold, and explicitly only forgets Memory objects, not document chunks. If you ask it to forget something that only matched as a chunk, it tells you it can't.
The similarity utilities in packages/lib/similarity.ts are plain dot product on pre-normalized vectors — cosine similarity simplifies to dot product when both vectors are unit vectors. The visual properties for the memory graph (opacity, thickness, pulse duration) are calculated directly from similarity scores. Higher similarity = faster pulse. That's a nice detail.
The ingest pipeline runs on Cloudflare Workflows (cron every 4 hours). Content hits an IngestContentWorkflow that handles type detection, chunking, embedding via Cloudflare AI, and space relationship management.
Using it
MCP install is a single command:
npx -y install-mcp@latest https://mcp.supermemory.ai/mcp --client claude --oauth=yes
Or manual config:
{
"mcpServers": {
"supermemory": {
"url": "https://mcp.supermemory.ai/mcp",
"headers": {
"Authorization": "Bearer sm_your_api_key"
}
}
}
}
Once connected, the /context prompt command in Claude Code injects your full profile as a system message — stable facts plus recent activity. For API usage:
import Supermemory from "supermemory";
const client = new Supermemory();
await client.add({
content: "Prefers TypeScript, functional patterns, hates mutable state",
containerTag: "user_123",
});
const { profile, searchResults } = await client.profile({
containerTag: "user_123",
q: "what coding style does this user prefer?",
});
Framework integrations exist for Vercel AI SDK, LangChain, Mastra, OpenAI Agents SDK:
import { withSupermemory } from "@supermemory/tools/ai-sdk";
const model = withSupermemory(openai("gpt-4o"), "user_123");
Rough edges
The codebase is actively maintained — commits are coming in daily as of April 2026. But the proprietary memory engine is closed source. What's open here is the MCP server, the SDK wrappers, the memory-graph playground, and the benchmark framework (MemoryBench). The actual extraction logic, contradiction resolution, and temporal forgetting live in their API, which you're calling on every operation.
The @ts-expect-error comments in src/server.ts are scattered throughout the tool registrations due to a Zod inference issue with the MCP SDK. Not a blocker but worth knowing if you fork this.
The benchmark tooling is interesting: bun run src/index.ts run -p supermemory -b longmemeval -j gpt-4o -r my-run. You can benchmark your own memory implementation against theirs, which is a useful signal even if you're suspicious of self-reported numbers.
Bottom line
If you're building an AI agent that needs to remember users across sessions and don't want to wire up a vector DB, chunk pipeline, and profile system from scratch, this API handles the full stack. The MCP server works out of the box with Claude Code, Cursor, and Windsurf — useful even without building anything.
