Letta and RetainDB represent different scales and philosophies in AI agent memory. Letta (formerly MemGPT) is a well-funded, research-backed system with agent-controlled memory tiers, while RetainDB is a newer, focused solution built on PostgreSQL with chronological retrieval and state-of-the-art preference recall.
This comparison covers their architectures, benchmark performance, pricing, and ideal use cases to help you decide.
Quick Comparison
| Factor | Letta | RetainDB |
|---|
| Architecture | Three-tier self-editing memory | PostgreSQL + pgvector with chronological retrieval |
| LongMemEval* | Not published | 79% |
| Deployment | Self-hosted (Docker/Python) or Letta Cloud | Cloud or self-hosted (PostgreSQL required) |
| Pricing | Open source / Cloud TBD | Free (10K ops/mo) / Pro $20/mo |
| GitHub Stars | 22K | 8 |
| Funding | $10M seed (YC, Jeff Dean) | Not disclosed |
What is Letta?
Letta (formerly MemGPT) pioneered the concept of agents that manage their own memory through function calls. Born from UC Berkeley research, the agent decides what's worth remembering and can edit its own memory across three tiers: core, recall, and archival.
With 22K GitHub stars and $10M in seed funding (including Jeff Dean as an investor), Letta has strong research credentials and one of the larger communities in the agent memory space.
Key strengths:
- Research-backed approach (UC Berkeley)
- Agent autonomy in memory management
- Active benchmark publishing (Letta Leaderboard)
- Strong coding agent focus (Letta Code)
- Well-funded with notable investors
What is RetainDB?
RetainDB provides persistent memory with turn-by-turn extraction and chronological retrieval. Its standout feature is 88% accuracy on preference recall—state-of-the-art for personalization. It also claims a 0% hallucination rate on documentation questions.
With only 8 GitHub stars but a reproducible benchmark runner shipped in the repo, RetainDB emphasizes verifiable accuracy over community size.
Key strengths:
- SOTA on preference recall (88%)
- 0% hallucination rate claimed
- Reproducible benchmarks (runner ships in repo)
- Affordable pricing ($20/mo Pro)
- Full chronological retrieval (no lossy semantic search)
Architecture Comparison
Letta's Approach
Letta gives the agent control over memory management through function calls across three tiers:
- Core memory: Always in the context window
- Recall memory: Searchable conversation cache
- Archival memory: Long-term storage
The agent decides what moves between tiers. Memory operations are explicit function calls, making the system's behavior inspectable but dependent on the agent's decision quality.
RetainDB's Approach
RetainDB uses a pipeline built on PostgreSQL + pgvector:
- Turn-by-turn extraction: Every conversation turn processed individually with 3-turn context
- Atomic memory writes: Stored with eventDate, documentDate, and confidence scores
- Chronological retrieval: Full timeline visible (not just semantically similar results)
- Answer generation: Based on complete memory dumps
The key design choice: RetainDB avoids lossy semantic search by providing complete memory chronology instead. The answering model sees the full timeline rather than a filtered subset.
The Key Difference
Letta relies on agent judgment for storage; RetainDB relies on systematic extraction with chronological retrieval.
When a user mentions a preference, Letta's agent must decide to store it and choose the right tier. RetainDB automatically processes every turn and stores extracted information with timestamps and confidence scores.
For retrieval, Letta's agent must decide when and how to search. RetainDB provides the complete chronological memory dump, letting the answering model reason over the full timeline rather than a semantically filtered subset.
This chronological approach explains RetainDB's strength in preference recall—preferences mentioned across multiple conversations are all visible in the timeline rather than filtered by semantic similarity to the current query.
| Benchmark | Letta | RetainDB |
|---|
| LongMemEval* | Not published | 79% |
| Preference Recall | — | 88% (SOTA) |
Letta has not published LongMemEval scores. RetainDB's 79% on overall LongMemEval is moderate, but its 88% on preference recall is state-of-the-art.
Both trail Hypabase on overall LongMemEval (87.4%), though Hypabase achieves 100% on personalization tasks—surpassing RetainDB's 88% preference recall.
Pricing Comparison
Letta
| Tier | Price | Details |
|---|
| Open Source | Free | Self-hosted via Docker or Python |
| Letta Cloud | TBD | Managed hosting, pricing not finalized |
RetainDB
| Tier | Price | Details |
|---|
| Free | $0 | 10K operations/month |
| Pro | $20/month | 100K queries |
RetainDB has the most transparent and affordable pricing: $20/month for 100K queries. Letta's open-source version is free but requires self-hosting infrastructure; cloud pricing is TBD.
For budget-conscious teams, RetainDB's $20/mo Pro tier offers predictable costs. Letta's self-hosting requires your own compute and LLM API costs.
When to Choose Letta
Choose Letta if you:
- Want agents that autonomously manage their own memory
- Need a large community for support (22K stars)
- Are building coding agents (Letta Code)
- Want a framework-agnostic memory system
Letta's research backing and large community provide ecosystem advantages, though the lack of published retrieval benchmarks makes accuracy hard to evaluate.
When to Choose RetainDB
Choose RetainDB if you:
- Need strong preference/personalization recall (88%)
- Want transparent, affordable pricing ($20/mo)
- Need reproducible benchmarks (runner in repo)
- Want chronological retrieval over semantic search
RetainDB excels at personalization use cases with its chronological approach. The small community (8 stars) and PostgreSQL requirement are the main tradeoffs.
Consider Hypabase
RetainDB's strongest claim is 88% preference recall—genuinely impressive for personalization. But its overall retrieval (79% LongMemEval) lags behind, and the chronological dump approach means the answering model must sift through an entire timeline to find relevant facts. Letta, with its well-funded ecosystem and Letta Code focus, gives agents autonomy over memory—but that autonomy comes without published accuracy numbers.
Hypabase beats both on personalization (100%) and overall retrieval (87.4%): AMR-based extraction into hyperedges.
| Factor | Letta | RetainDB | Hypabase |
|---|
| Extraction | Agent-controlled function calls | Turn-by-turn with 3-turn context | AMR (formal linguistic framework) |
| Representation | Three memory tiers | PostgreSQL rows with timestamps | N-ary hyperedges |
| LongMemEval* | Not published | 79% | 87.4% |
| Personalization | — | 88% | 100% |
Hypabase uses Abstract Meaning Representation (AMR)—a formal framework from computational linguistics—to parse natural language into structured graphs. Facts are stored in PENMAN notation with karaka semantic roles (from Panini's Sanskrit grammar):
"Iris requested dark mode for all dashboards last Tuesday"
Chronological extraction (RetainDB):
{eventDate: "Tuesday", fact: "Iris requested dark mode", confidence: 0.9}
{eventDate: "Tuesday", fact: "request was for dashboards", confidence: 0.85}
Ad-hoc extraction (Letta):
(Iris, requested, dark mode)
(request, scope, all dashboards)
AMR extraction (Hypabase):
(requested :agent Iris :object dark-mode :locus dashboards :attribute all :locus last-Tuesday)
The difference: Hypabase captures who requested, what they requested, which surfaces, the scope, and when—all in one hyperedge. RetainDB stores timestamped rows that must be reassembled; the "all dashboards" scope lives in a separate row with lower confidence. When you later ask "Does Iris want dark mode on the analytics dashboard?", Hypabase's :locus dashboards :attribute all gives a definitive yes. RetainDB's chronological dump leaves the answering model to infer scope from fragmented entries.
Why 100% Beats 88% on Personalization
| Benefit | How It Works |
|---|
| Scope and specificity preserved | "All dashboards" vs "analytics dashboard" vs "main dashboard" are structurally distinct roles, not flattened into text rows |
| No chronological sifting | Structured queries hit the right hyperedge directly—the model doesn't read an entire timeline to find one preference |
| 87.4% overall, not just preferences | Strong across all retrieval tasks, not just the personalization subset where RetainDB excels |
| No PostgreSQL required | Single SQLite file vs RetainDB's pgvector dependency—simpler to deploy and cheaper to run |
Learn more about Hypabase →
FAQ
Is Letta better than RetainDB?
They serve different needs. Letta offers agent-controlled memory with a large ecosystem. RetainDB excels at preference recall (88%) with affordable pricing. For the highest personalization accuracy with structured extraction, consider Hypabase (87.4% overall, 100% personalization).
Is RetainDB's 0% hallucination claim reliable?
RetainDB ships a benchmark runner in its repo, so you can verify the claim yourself. The 0% is specifically for documentation questions, not general retrieval. Always run your own evaluation on your specific use case.
What's the main difference?
Letta delegates memory decisions to the agent (autonomous control). RetainDB uses systematic turn-by-turn extraction with chronological retrieval (strong for preferences). Hypabase uses AMR for structured extraction into hyperedges, combining linguistic precision with graph-based retrieval.
Which is better for self-hosting?
Letta deploys via Docker or Python with no external database. RetainDB requires PostgreSQL. Hypabase runs entirely in a single SQLite file with no external database required—the simplest self-hosting option.
Conclusion
Letta offers agent-controlled memory management with strong research backing and a large community, but no published LongMemEval scores.
RetainDB achieves 79% overall and 88% on preference recall with transparent $20/mo pricing—strong for personalization, though overall retrieval accuracy trails other options.
Hypabase achieves 87.4% through AMR-based extraction into hyperedges—structured knowledge representation that preserves relationships ad-hoc extraction fragments. 100% on personalization tasks.
All three are straightforward to integrate:
Try Hypabase →
LongMemEval scores: RetainDB (79%) self-reported. Letta has not published LongMemEval scores. Hypabase (87.4%) from published benchmark harness.