mem0 is a memory layer you drop in front of any LLM. Pass it a conversation, it extracts the facts that matter, compares them against what it already knows about that user or agent, and decides what to ADD, UPDATE, DELETE, or ignore — all through a second LLM call. The result is a queryable memory store that grows and corrects itself over time.
Why I starred it
The obvious approach to "persistent AI memory" is to stuff ever-growing conversation history into the context window. That works until it doesn't — latency spikes, costs compound, and you hit token limits. The other common approach is RAG over raw logs, which means the model has to wade through repetitive and contradictory conversation turns every time.
mem0 takes a different path: it extracts and normalizes facts, then explicitly resolves conflicts. If the user says "I prefer Python" in one session and "I've switched to TypeScript" in the next, the memory reflects the update, not both statements. Their LOCOMO benchmark results claim +26% accuracy over OpenAI Memory and 90% fewer tokens than full-context retrieval. I haven't reproduced those numbers, but the architecture makes the claim plausible.
How it works
The core logic lives in mem0/memory/main.py. The add() method is the interesting part — it's a two-LLM-call pipeline.
Step 1: fact extraction. The conversation is passed to an LLM using a carefully tuned system prompt from mem0/configs/prompts.py:
# FACT_RETRIEVAL_PROMPT instructs the LLM to extract structured facts
# Input: "Hi, my name is John. I am a software engineer."
# Output: {"facts": ["Name is John", "Is a Software engineer"]}
The prompt is opinionated about what counts as a fact: personal preferences, plans, health data, professional details. Generic chitchat returns an empty list. The LLM responds in JSON and mem0 falls back to a regex-based JSON extractor (extract_json()) if the model gets chatty around the JSON.
Step 2: conflict resolution. Each extracted fact gets embedded and searched against the existing vector store for that user_id. The retrieved memories plus the new facts go into a second LLM call via get_update_memory_messages():
# The LLM returns a structured diff:
{
"memory": [
{"id": "3", "text": "Switched to TypeScript", "event": "UPDATE", "old_memory": "Prefers Python"},
{"id": "new", "text": "Senior engineer at Acme", "event": "ADD"},
{"id": "7", "text": "...", "event": "NONE"}
]
}
Back in _add_to_vector_store(), mem0 walks this list and dispatches to _create_memory(), _update_memory(), or _delete_memory() accordingly. There's a subtle but important detail here: existing memory IDs are temporarily remapped to integers before being sent to the LLM, and _resolve_mapped_id() translates them back. This prevents UUID hallucination — a real failure mode where the LLM invents IDs that don't exist. The fix landed in a recent commit after issue #3931.
Graph memory is a separate optional layer in mem0/memory/graph_memory.py. It builds a Neo4j knowledge graph alongside the vector store, extracting entities and relations from conversations using LangChain's Neo4j integration. It's opt-in and has its own similarity threshold (default 0.7) for deduplication.
The factory pattern in mem0/utils/factory.py handles provider abstraction. LlmFactory.create() supports 18 providers — OpenAI, Anthropic, Gemini, Ollama, AWS Bedrock, Azure, LM Studio, vLLM, and more — all resolved at instantiation time via importlib.import_module(). Same pattern for embedders, vector stores, and rerankers.
Using it
Self-hosted with defaults (Qdrant in-process, OpenAI embeddings):
from mem0 import Memory
m = Memory()
# Ingest a conversation turn
result = m.add(
[{"role": "user", "content": "I love hiking and I'm training for a marathon"}],
user_id="alice"
)
# result: [{"id": "abc123", "memory": "Loves hiking", "event": "ADD"}, ...]
# Retrieve relevant memories for the next turn
memories = m.search("What does Alice like to do outside?", user_id="alice")
# returns ranked list of MemoryItem with score
Swap in Anthropic and a custom vector store with a config object:
from mem0 import Memory
from mem0.configs.base import MemoryConfig
config = MemoryConfig(
llm={"provider": "anthropic", "config": {"model": "claude-3-5-sonnet-20241022"}},
vector_store={"provider": "pgvector", "config": {"host": "localhost", "dbname": "mem0"}},
)
m = Memory(config=config)
The async variant (AsyncMemory) mirrors the sync API exactly — same methods, same config. Both support context manager syntax for cleanup.
Rough edges
Tests cover happy paths, not edge cases. tests/test_main.py tests the primary add/search/delete flow but leans heavily on mocks. The graph memory tests in test_graph_delete.py require a live Neo4j instance — there's a Docker variant and an e2e variant, but no in-memory substitute for local development.
The two-LLM-call overhead is real. Every add() call makes at least two LLM requests — one for extraction, one for conflict resolution. For high-frequency agents this can add meaningful latency and cost. The infer=False flag bypasses inference and stores messages verbatim, which avoids the cost but defeats the purpose.
Hosted vs. self-hosted divergence. The platform product at mem0.ai has features (analytics, teams, access controls) that don't exist in the OSS package. The MemoryClient in mem0/client/main.py is just a thin HTTP wrapper around the hosted API — it shares no code with the local Memory class. That's fine, but if you build on self-hosted and later want to migrate, there's no clean path.
Custom prompts require care. MemoryConfig accepts custom_fact_extraction_prompt and custom_update_memory_prompt, but the update prompt must produce exactly the JSON schema that the dispatch loop expects (event ∈ {ADD, UPDATE, DELETE, NONE}). There's no validation — a prompt that returns different field names silently drops memories.
Bottom line
If you're building an AI assistant, customer support bot, or any agent that needs to remember things across sessions, mem0's extract-and-diff approach is cleaner than raw RAG over conversation logs. The architecture is sound, the provider support is broad, and active maintenance is visible. The hosted API is the easiest entry point; self-hosted is viable if you're comfortable with Qdrant and don't need graph memory.
