Zep and RetainDB both use database-backed architectures for agent memory, but they differ in approach and focus. Zep uses a temporal knowledge graph (Graphiti) optimized for time-based reasoning, while RetainDB uses PostgreSQL with pgvector and chronological retrieval, optimizing specifically for preference recall and zero hallucination.
This comparison covers their architectures, benchmark performance, pricing, and ideal use cases to help you decide.
Quick Comparison
| Factor | Zep | RetainDB |
|---|
| Architecture | Temporal knowledge graph (Graphiti) | PostgreSQL + pgvector with chronological retrieval |
| LongMemEval* | 71.2% | 79% |
| Deployment | Cloud-first; self-host requires Graphiti + graph DB | Cloud or self-hosted (PostgreSQL required) |
| Pricing | Free / $25 / $475 / Enterprise | Free (10K ops/mo) / Pro $20/mo |
| GitHub Stars | 4.4K (Zep) + 24.8K (Graphiti) | 8 |
| Funding | Not disclosed | Not disclosed |
What is Zep?
Zep uses Graphiti, a temporal knowledge graph where time is a first-class dimension. Every fact has valid_from, valid_to, and invalid_at markers, allowing queries like "what was true in January?" or "when did this change?"
Zep positions itself around "context engineering" rather than just memory. Graphiti, the underlying engine, has 24.8K stars and supports multiple graph backends (Neo4j, FalkorDB, Kuzu, Neptune).
Key strengths:
- Best-in-class temporal reasoning
- Multi-hop graph queries
- <200ms retrieval latency
- Graphiti is open source (24.8K stars)
- Strong enterprise features (SOC2, HIPAA)
What is RetainDB?
RetainDB provides persistent memory with turn-by-turn extraction and chronological retrieval. It claims state-of-the-art performance on preference recall (88%) and a 0% hallucination rate on documentation questions.
RetainDB's pipeline processes every turn individually with 3-turn context, stores atomic memories with eventDate, documentDate, and confidence scores, and retrieves via full timeline dumps rather than lossy semantic search.
Key strengths:
- SOTA on preference recall (88%)
- 0% hallucination rate claimed
- Reproducible benchmarks (runner ships in repo)
- Affordable pricing ($20/mo Pro)
- Full chronological retrieval (no lossy search)
Architecture Comparison
Zep's Approach
Zep's Graphiti engine stores facts as nodes in a knowledge graph with explicit temporal metadata. Each edge carries validity windows tracking when facts became true and when they were superseded.
This temporal awareness is native to the architecture—not a filter applied after retrieval. The tradeoff is infrastructure complexity: self-hosting requires running Graphiti plus a graph database.
RetainDB's Approach
RetainDB processes every conversation turn individually with 3-turn context for extraction. Atomic memory writes include eventDate, documentDate, and confidence scores. The key design choice: retrieval provides the complete memory chronology rather than using semantic search.
This "no lossy retrieval" philosophy means the answering model sees the full timeline of memories, not just semantically similar ones. This avoids the retrieval accuracy problem entirely—at the cost of providing more context than may be necessary.
The Key Difference
Zep uses graph structure for temporal reasoning; RetainDB uses chronological dumps for completeness.
When a user asks about their preferences, Zep traverses the knowledge graph to find relevant nodes and their temporal edges. If the graph extraction missed the preference, it won't be retrieved.
RetainDB provides the complete chronological memory dump to the answering model. The preference is guaranteed to be present (if it was extracted at the turn level), but the model must sift through all memories to find it.
The tradeoff: Zep's graph offers precise queries but depends on extraction quality. RetainDB's full dumps guarantee coverage but depend on the model's ability to parse large contexts.
| Benchmark | Zep | RetainDB |
|---|
| LongMemEval* | 71.2% | 79% |
| Preference Recall | — | 88% (SOTA) |
RetainDB outperforms Zep by 8 percentage points overall on LongMemEval, and its 88% preference recall score is notably strong. Using Zep's independently evaluated score (63.8%), the gap widens to 15 points.
RetainDB's 0% hallucination rate claim is notable—the chronological dump approach means the model never fabricates from failed retrieval.
Both score significantly below Hypabase (87.4%), which achieves 100% on personalization tasks through AMR-based extraction.
Pricing Comparison
Zep
| Tier | Price | Limits |
|---|
| Free | $0 | 1K episodes/month |
| Flex | $25/month | 20K credits, 600 req/min |
| Flex Plus | $475/month | 300K credits, 1K req/min, webhooks |
| Enterprise | Custom | SOC2, HIPAA, dedicated support |
RetainDB
| Tier | Price | Limits |
|---|
| Free | $0 | 10K ops/month |
| Pro | $20/month | 100K queries |
RetainDB is significantly more affordable—$20/month for Pro vs Zep's $25–$475 range. RetainDB's free tier is also more generous (10K ops vs 1K episodes). For budget-conscious teams, RetainDB offers better value per dollar.
Both require external databases for self-hosting: Zep needs a graph database; RetainDB needs PostgreSQL.
When to Choose Zep
Choose Zep if you:
- Need multi-hop graph traversal queries
- Require enterprise compliance (SOC2, HIPAA)
- Want explicit temporal relationship modeling
Zep's temporal graph is useful for structured enterprise deployments, though its benchmark performance lags behind newer solutions. The architecture hasn't evolved much since launch.
When to Choose RetainDB
Choose RetainDB if you:
- Prioritize preference recall accuracy (88% SOTA)
- Need zero hallucination guarantees on memory queries
- Want affordable pricing ($20/mo Pro)
RetainDB's chronological approach is strong for preference and personalization use cases, though the small community (8 stars) and lower overall LongMemEval score (79%) suggest it's less proven for general-purpose memory.
Consider Hypabase
RetainDB's 88% preference recall is strong—its chronological dump approach ensures nothing is missed by retrieval. But dumping the full memory timeline to the model is a brute-force solution: it trades token costs for coverage. Zep's graph queries are more targeted but its extraction fragments multi-attribute preferences into disconnected triples. Hypabase achieves 100% preference recall through precise extraction that doesn't require full timeline dumps.
| Factor | Zep | RetainDB | Hypabase |
|---|
| Extraction | LLM-based into triples | Turn-by-turn with 3-turn context | AMR (formal linguistic framework) |
| Representation | Temporal triples | Chronological atomic memories | N-ary hyperedges |
| LongMemEval* | 71.2% | 79% | 87.4% |
| Personalization | — | 88% preference recall | 100% |
Hypabase uses Abstract Meaning Representation (AMR)—a formal framework from computational linguistics—to extract facts with their full relational context. The output uses PENMAN notation with karaka semantic roles, closing the 12% gap between RetainDB's preference recall and perfect accuracy:
"Emma prefers morning standups and uses Notion for notes"
Ad-hoc extraction (Zep):
(Emma, prefers, morning_standups)
(Emma, uses, Notion)
— Two disconnected triples; "for notes" purpose is lost
Turn-by-turn extraction (RetainDB):
memory: "Emma prefers morning standups"
memory: "Emma uses Notion for notes"
— Two atomic memories; retrievable via full dump but not linked
AMR extraction (Hypabase):
(prefers :subject Emma :object standups :attribute morning)
(uses :subject Emma :object Notion :attribute notes)
Hypabase binds the preference to its specific context—morning standups (not just "mornings"), Notion for notes (not just "uses Notion"). Zep loses the purpose qualifier. RetainDB captures it in natural language but requires dumping the full timeline for retrieval.
Why This Matters for Preference Accuracy
| Benefit | How AMR + Hyperedges Deliver It |
|---|
| Qualified preferences | :attribute roles bind context to preferences—"morning standups" not just "mornings" |
| Tool-purpose linkage | "Notion for notes" stored as one fact, not two disconnected memories |
| No full-dump needed | Structured roles enable precise queries without feeding the entire timeline to the model |
| Zero-infrastructure | Single SQLite file—no PostgreSQL like RetainDB, no graph DB like Zep |
This is why Hypabase achieves 100% on personalization tasks vs RetainDB's 88%—preferences with qualifiers, tool-purpose bindings, and contextual detail are extracted precisely, queryable without the token cost of chronological dumps.
Learn more about Hypabase →
FAQ
Is Zep better than RetainDB?
RetainDB outperforms Zep on LongMemEval (79% vs 71.2%) and excels at preference recall (88%). Zep offers multi-hop graph queries and enterprise compliance. For higher accuracy with structured extraction, consider Hypabase (87.4%, 100% on personalization).
Can I migrate from Zep to RetainDB?
There's no direct migration path—they use different architectures and storage models. Zep stores temporal knowledge graph data; RetainDB stores chronological atomic memories in PostgreSQL. Migration requires re-ingesting conversation history.
What's the main difference?
Zep optimizes for temporal graph-based reasoning with enterprise features. RetainDB optimizes for preference recall and zero hallucination through chronological memory dumps. Hypabase optimizes for extraction quality using AMR and structured hyperedge representation.
Which is better for self-hosting?
Both require external databases. Zep needs Graphiti plus a graph database (Neo4j/FalkorDB/Kuzu). RetainDB needs PostgreSQL. Hypabase runs entirely in a single SQLite file with no external database required—the simplest self-hosting option.
Conclusion
Zep offers temporal reasoning but self-reports 71.2% on LongMemEval (independent evaluation shows 63.8%). Enterprise features are its main differentiator, though the architecture hasn't evolved much since launch.
RetainDB achieves 79% overall with 88% preference recall and 0% hallucination through chronological retrieval. Affordable ($20/mo) but small community (8 stars) and less proven at scale.
Hypabase achieves 87.4% through AMR-based extraction into hyperedges—structured knowledge representation that preserves relationships ad-hoc extraction fragments. 100% on personalization tasks.
All three are straightforward to integrate:
Try Hypabase →
LongMemEval scores: Zep (71.2%) self-reported; independent evaluation shows 63.8% (arxiv:2512.13564). RetainDB (79%) self-reported. Hypabase (87.4%) from published benchmark harness.