Zep vs RetainDB: Which AI Memory Solution Should You Choose?

Zep and RetainDB both use database-backed architectures for agent memory, but they differ in approach and focus. Zep uses a temporal knowledge graph (Graphiti) optimized for time-based reasoning, while RetainDB uses PostgreSQL with pgvector and chronological retrieval, optimizing specifically for preference recall and zero hallucination.

This comparison covers their architectures, benchmark performance, pricing, and ideal use cases to help you decide.

Quick Comparison

Factor	Zep	RetainDB
Architecture	Temporal knowledge graph (Graphiti)	PostgreSQL + pgvector with chronological retrieval
LongMemEval*	71.2%	79%
Deployment	Cloud-first; self-host requires Graphiti + graph DB	Cloud or self-hosted (PostgreSQL required)
Pricing	Free / $25 / $475 / Enterprise	Free (10K ops/mo) / Pro $20/mo
GitHub Stars	4.4K (Zep) + 24.8K (Graphiti)	8
Funding	Not disclosed	Not disclosed

What is Zep?

Zep uses Graphiti, a temporal knowledge graph where time is a first-class dimension. Every fact has valid_from, valid_to, and invalid_at markers, allowing queries like "what was true in January?" or "when did this change?"

Zep positions itself around "context engineering" rather than just memory. Graphiti, the underlying engine, has 24.8K stars and supports multiple graph backends (Neo4j, FalkorDB, Kuzu, Neptune).

Key strengths:

Best-in-class temporal reasoning
Multi-hop graph queries
<200ms retrieval latency
Graphiti is open source (24.8K stars)
Strong enterprise features (SOC2, HIPAA)

What is RetainDB?

RetainDB provides persistent memory with turn-by-turn extraction and chronological retrieval. It claims state-of-the-art performance on preference recall (88%) and a 0% hallucination rate on documentation questions.

RetainDB's pipeline processes every turn individually with 3-turn context, stores atomic memories with eventDate, documentDate, and confidence scores, and retrieves via full timeline dumps rather than lossy semantic search.

Key strengths:

SOTA on preference recall (88%)
0% hallucination rate claimed
Reproducible benchmarks (runner ships in repo)
Affordable pricing ($20/mo Pro)
Full chronological retrieval (no lossy search)

Architecture Comparison

Zep's Approach

Zep's Graphiti engine stores facts as nodes in a knowledge graph with explicit temporal metadata. Each edge carries validity windows tracking when facts became true and when they were superseded.

This temporal awareness is native to the architecture—not a filter applied after retrieval. The tradeoff is infrastructure complexity: self-hosting requires running Graphiti plus a graph database.

RetainDB's Approach

RetainDB processes every conversation turn individually with 3-turn context for extraction. Atomic memory writes include eventDate, documentDate, and confidence scores. The key design choice: retrieval provides the complete memory chronology rather than using semantic search.

This "no lossy retrieval" philosophy means the answering model sees the full timeline of memories, not just semantically similar ones. This avoids the retrieval accuracy problem entirely—at the cost of providing more context than may be necessary.

The Key Difference

Zep uses graph structure for temporal reasoning; RetainDB uses chronological dumps for completeness.

When a user asks about their preferences, Zep traverses the knowledge graph to find relevant nodes and their temporal edges. If the graph extraction missed the preference, it won't be retrieved.

RetainDB provides the complete chronological memory dump to the answering model. The preference is guaranteed to be present (if it was extracted at the turn level), but the model must sift through all memories to find it.

The tradeoff: Zep's graph offers precise queries but depends on extraction quality. RetainDB's full dumps guarantee coverage but depend on the model's ability to parse large contexts.

Benchmark Performance

Benchmark	Zep	RetainDB
LongMemEval*	71.2%	79%
Preference Recall	—	88% (SOTA)

RetainDB outperforms Zep by 8 percentage points overall on LongMemEval, and its 88% preference recall score is notably strong. Using Zep's independently evaluated score (63.8%), the gap widens to 15 points.

RetainDB's 0% hallucination rate claim is notable—the chronological dump approach means the model never fabricates from failed retrieval.

Both score significantly below Hypabase (87.4%), which achieves 100% on personalization tasks through AMR-based extraction.

Pricing Comparison

Zep

Tier	Price	Limits
Free	$0	1K episodes/month
Flex	$25/month	20K credits, 600 req/min
Flex Plus	$475/month	300K credits, 1K req/min, webhooks
Enterprise	Custom	SOC2, HIPAA, dedicated support

RetainDB

Tier	Price	Limits
Free	$0	10K ops/month
Pro	$20/month	100K queries

RetainDB is significantly more affordable—$20/month for Pro vs Zep's $25–$475 range. RetainDB's free tier is also more generous (10K ops vs 1K episodes). For budget-conscious teams, RetainDB offers better value per dollar.

Both require external databases for self-hosting: Zep needs a graph database; RetainDB needs PostgreSQL.

When to Choose Zep

Choose Zep if you:

Need multi-hop graph traversal queries
Require enterprise compliance (SOC2, HIPAA)
Want explicit temporal relationship modeling

Zep's temporal graph is useful for structured enterprise deployments, though its benchmark performance lags behind newer solutions. The architecture hasn't evolved much since launch.

When to Choose RetainDB

Choose RetainDB if you:

Prioritize preference recall accuracy (88% SOTA)
Need zero hallucination guarantees on memory queries
Want affordable pricing ($20/mo Pro)

RetainDB's chronological approach is strong for preference and personalization use cases, though the small community (8 stars) and lower overall LongMemEval score (79%) suggest it's less proven for general-purpose memory.

Consider Hypabase

RetainDB's 88% preference recall is strong—its chronological dump approach ensures nothing is missed by retrieval. But dumping the full memory timeline to the model is a brute-force solution: it trades token costs for coverage. Zep's graph queries are more targeted but its extraction fragments multi-attribute preferences into disconnected triples. Hypabase achieves 100% preference recall through precise extraction that doesn't require full timeline dumps.

Factor	Zep	RetainDB	Hypabase
Extraction	LLM-based into triples	Turn-by-turn with 3-turn context	AMR (formal linguistic framework)
Representation	Temporal triples	Chronological atomic memories	N-ary hyperedges
LongMemEval*	71.2%	79%	87.4%
Personalization	—	88% preference recall	100%

AMR Extraction + Hyperedges

Hypabase uses Abstract Meaning Representation (AMR)—a formal framework from computational linguistics—to extract facts with their full relational context. The output uses PENMAN notation with karaka semantic roles, closing the 12% gap between RetainDB's preference recall and perfect accuracy:

"Emma prefers morning standups and uses Notion for notes"

Ad-hoc extraction (Zep):
  (Emma, prefers, morning_standups)
  (Emma, uses, Notion)
  — Two disconnected triples; "for notes" purpose is lost

Turn-by-turn extraction (RetainDB):
  memory: "Emma prefers morning standups"
  memory: "Emma uses Notion for notes"
  — Two atomic memories; retrievable via full dump but not linked

AMR extraction (Hypabase):
  (prefers :subject Emma :object standups :attribute morning)
  (uses :subject Emma :object Notion :attribute notes)

Hypabase binds the preference to its specific context—morning standups (not just "mornings"), Notion for notes (not just "uses Notion"). Zep loses the purpose qualifier. RetainDB captures it in natural language but requires dumping the full timeline for retrieval.

Why This Matters for Preference Accuracy

Benefit	How AMR + Hyperedges Deliver It
Qualified preferences	`:attribute` roles bind context to preferences—"morning standups" not just "mornings"
Tool-purpose linkage	"Notion for notes" stored as one fact, not two disconnected memories
No full-dump needed	Structured roles enable precise queries without feeding the entire timeline to the model
Zero-infrastructure	Single SQLite file—no PostgreSQL like RetainDB, no graph DB like Zep

This is why Hypabase achieves 100% on personalization tasks vs RetainDB's 88%—preferences with qualifiers, tool-purpose bindings, and contextual detail are extracted precisely, queryable without the token cost of chronological dumps.

Learn more about Hypabase →

FAQ

Is Zep better than RetainDB?

RetainDB outperforms Zep on LongMemEval (79% vs 71.2%) and excels at preference recall (88%). Zep offers multi-hop graph queries and enterprise compliance. For higher accuracy with structured extraction, consider Hypabase (87.4%, 100% on personalization).

Can I migrate from Zep to RetainDB?

There's no direct migration path—they use different architectures and storage models. Zep stores temporal knowledge graph data; RetainDB stores chronological atomic memories in PostgreSQL. Migration requires re-ingesting conversation history.

What's the main difference?

Zep optimizes for temporal graph-based reasoning with enterprise features. RetainDB optimizes for preference recall and zero hallucination through chronological memory dumps. Hypabase optimizes for extraction quality using AMR and structured hyperedge representation.

Which is better for self-hosting?

Both require external databases. Zep needs Graphiti plus a graph database (Neo4j/FalkorDB/Kuzu). RetainDB needs PostgreSQL. Hypabase runs entirely in a single SQLite file with no external database required—the simplest self-hosting option.

Conclusion

Zep offers temporal reasoning but self-reports 71.2% on LongMemEval (independent evaluation shows 63.8%). Enterprise features are its main differentiator, though the architecture hasn't evolved much since launch.

RetainDB achieves 79% overall with 88% preference recall and 0% hallucination through chronological retrieval. Affordable ($20/mo) but small community (8 stars) and less proven at scale.

Hypabase achieves 87.4% through AMR-based extraction into hyperedges—structured knowledge representation that preserves relationships ad-hoc extraction fragments. 100% on personalization tasks.

All three are straightforward to integrate:

Try Hypabase →

LongMemEval scores: Zep (71.2%) self-reported; independent evaluation shows 63.8% (arxiv:2512.13564). RetainDB (79%) self-reported. Hypabase (87.4%) from published benchmark harness.