Letta vs RetainDB: Which AI Memory Solution Should You Choose?

Letta and RetainDB represent different scales and philosophies in AI agent memory. Letta (formerly MemGPT) is a well-funded, research-backed system with agent-controlled memory tiers, while RetainDB is a newer, focused solution built on PostgreSQL with chronological retrieval and state-of-the-art preference recall.

This comparison covers their architectures, benchmark performance, pricing, and ideal use cases to help you decide.

Quick Comparison

Factor	Letta	RetainDB
Architecture	Three-tier self-editing memory	PostgreSQL + pgvector with chronological retrieval
LongMemEval*	Not published	79%
Deployment	Self-hosted (Docker/Python) or Letta Cloud	Cloud or self-hosted (PostgreSQL required)
Pricing	Open source / Cloud TBD	Free (10K ops/mo) / Pro $20/mo
GitHub Stars	22K	8
Funding	$10M seed (YC, Jeff Dean)	Not disclosed

What is Letta?

Letta (formerly MemGPT) pioneered the concept of agents that manage their own memory through function calls. Born from UC Berkeley research, the agent decides what's worth remembering and can edit its own memory across three tiers: core, recall, and archival.

With 22K GitHub stars and $10M in seed funding (including Jeff Dean as an investor), Letta has strong research credentials and one of the larger communities in the agent memory space.

Key strengths:

Research-backed approach (UC Berkeley)
Agent autonomy in memory management
Active benchmark publishing (Letta Leaderboard)
Strong coding agent focus (Letta Code)
Well-funded with notable investors

What is RetainDB?

RetainDB provides persistent memory with turn-by-turn extraction and chronological retrieval. Its standout feature is 88% accuracy on preference recall—state-of-the-art for personalization. It also claims a 0% hallucination rate on documentation questions.

With only 8 GitHub stars but a reproducible benchmark runner shipped in the repo, RetainDB emphasizes verifiable accuracy over community size.

Key strengths:

SOTA on preference recall (88%)
0% hallucination rate claimed
Reproducible benchmarks (runner ships in repo)
Affordable pricing ($20/mo Pro)
Full chronological retrieval (no lossy semantic search)

Architecture Comparison

Letta's Approach

Letta gives the agent control over memory management through function calls across three tiers:

Core memory: Always in the context window
Recall memory: Searchable conversation cache
Archival memory: Long-term storage

The agent decides what moves between tiers. Memory operations are explicit function calls, making the system's behavior inspectable but dependent on the agent's decision quality.

RetainDB's Approach

RetainDB uses a pipeline built on PostgreSQL + pgvector:

Turn-by-turn extraction: Every conversation turn processed individually with 3-turn context
Atomic memory writes: Stored with eventDate, documentDate, and confidence scores
Chronological retrieval: Full timeline visible (not just semantically similar results)
Answer generation: Based on complete memory dumps

The key design choice: RetainDB avoids lossy semantic search by providing complete memory chronology instead. The answering model sees the full timeline rather than a filtered subset.

The Key Difference

Letta relies on agent judgment for storage; RetainDB relies on systematic extraction with chronological retrieval.

When a user mentions a preference, Letta's agent must decide to store it and choose the right tier. RetainDB automatically processes every turn and stores extracted information with timestamps and confidence scores.

For retrieval, Letta's agent must decide when and how to search. RetainDB provides the complete chronological memory dump, letting the answering model reason over the full timeline rather than a semantically filtered subset.

This chronological approach explains RetainDB's strength in preference recall—preferences mentioned across multiple conversations are all visible in the timeline rather than filtered by semantic similarity to the current query.

Benchmark Performance

Benchmark	Letta	RetainDB
LongMemEval*	Not published	79%
Preference Recall	—	88% (SOTA)

Letta has not published LongMemEval scores. RetainDB's 79% on overall LongMemEval is moderate, but its 88% on preference recall is state-of-the-art.

Both trail Hypabase on overall LongMemEval (87.4%), though Hypabase achieves 100% on personalization tasks—surpassing RetainDB's 88% preference recall.

Pricing Comparison

Letta

Tier	Price	Details
Open Source	Free	Self-hosted via Docker or Python
Letta Cloud	TBD	Managed hosting, pricing not finalized

RetainDB

Tier	Price	Details
Free	$0	10K operations/month
Pro	$20/month	100K queries

RetainDB has the most transparent and affordable pricing: $20/month for 100K queries. Letta's open-source version is free but requires self-hosting infrastructure; cloud pricing is TBD.

For budget-conscious teams, RetainDB's $20/mo Pro tier offers predictable costs. Letta's self-hosting requires your own compute and LLM API costs.

When to Choose Letta

Choose Letta if you:

Want agents that autonomously manage their own memory
Need a large community for support (22K stars)
Are building coding agents (Letta Code)
Want a framework-agnostic memory system

Letta's research backing and large community provide ecosystem advantages, though the lack of published retrieval benchmarks makes accuracy hard to evaluate.

When to Choose RetainDB

Choose RetainDB if you:

Need strong preference/personalization recall (88%)
Want transparent, affordable pricing ($20/mo)
Need reproducible benchmarks (runner in repo)
Want chronological retrieval over semantic search

RetainDB excels at personalization use cases with its chronological approach. The small community (8 stars) and PostgreSQL requirement are the main tradeoffs.

Consider Hypabase

RetainDB's strongest claim is 88% preference recall—genuinely impressive for personalization. But its overall retrieval (79% LongMemEval) lags behind, and the chronological dump approach means the answering model must sift through an entire timeline to find relevant facts. Letta, with its well-funded ecosystem and Letta Code focus, gives agents autonomy over memory—but that autonomy comes without published accuracy numbers.

Hypabase beats both on personalization (100%) and overall retrieval (87.4%): AMR-based extraction into hyperedges.

Factor	Letta	RetainDB	Hypabase
Extraction	Agent-controlled function calls	Turn-by-turn with 3-turn context	AMR (formal linguistic framework)
Representation	Three memory tiers	PostgreSQL rows with timestamps	N-ary hyperedges
LongMemEval*	Not published	79%	87.4%
Personalization	—	88%	100%

AMR Extraction + Hyperedges

Hypabase uses Abstract Meaning Representation (AMR)—a formal framework from computational linguistics—to parse natural language into structured graphs. Facts are stored in PENMAN notation with karaka semantic roles (from Panini's Sanskrit grammar):

"Iris requested dark mode for all dashboards last Tuesday"

Chronological extraction (RetainDB):
  {eventDate: "Tuesday", fact: "Iris requested dark mode", confidence: 0.9}
  {eventDate: "Tuesday", fact: "request was for dashboards", confidence: 0.85}
  
Ad-hoc extraction (Letta):
  (Iris, requested, dark mode)
  (request, scope, all dashboards)
  
AMR extraction (Hypabase):
  (requested :agent Iris :object dark-mode :locus dashboards :attribute all :locus last-Tuesday)

The difference: Hypabase captures who requested, what they requested, which surfaces, the scope, and when—all in one hyperedge. RetainDB stores timestamped rows that must be reassembled; the "all dashboards" scope lives in a separate row with lower confidence. When you later ask "Does Iris want dark mode on the analytics dashboard?", Hypabase's :locus dashboards :attribute all gives a definitive yes. RetainDB's chronological dump leaves the answering model to infer scope from fragmented entries.

Why 100% Beats 88% on Personalization

Benefit	How It Works
Scope and specificity preserved	"All dashboards" vs "analytics dashboard" vs "main dashboard" are structurally distinct roles, not flattened into text rows
No chronological sifting	Structured queries hit the right hyperedge directly—the model doesn't read an entire timeline to find one preference
87.4% overall, not just preferences	Strong across all retrieval tasks, not just the personalization subset where RetainDB excels
No PostgreSQL required	Single SQLite file vs RetainDB's pgvector dependency—simpler to deploy and cheaper to run

Learn more about Hypabase →

FAQ

Is Letta better than RetainDB?

They serve different needs. Letta offers agent-controlled memory with a large ecosystem. RetainDB excels at preference recall (88%) with affordable pricing. For the highest personalization accuracy with structured extraction, consider Hypabase (87.4% overall, 100% personalization).

Is RetainDB's 0% hallucination claim reliable?

RetainDB ships a benchmark runner in its repo, so you can verify the claim yourself. The 0% is specifically for documentation questions, not general retrieval. Always run your own evaluation on your specific use case.

What's the main difference?

Letta delegates memory decisions to the agent (autonomous control). RetainDB uses systematic turn-by-turn extraction with chronological retrieval (strong for preferences). Hypabase uses AMR for structured extraction into hyperedges, combining linguistic precision with graph-based retrieval.

Which is better for self-hosting?

Letta deploys via Docker or Python with no external database. RetainDB requires PostgreSQL. Hypabase runs entirely in a single SQLite file with no external database required—the simplest self-hosting option.

Conclusion

Letta offers agent-controlled memory management with strong research backing and a large community, but no published LongMemEval scores.

RetainDB achieves 79% overall and 88% on preference recall with transparent $20/mo pricing—strong for personalization, though overall retrieval accuracy trails other options.

Hypabase achieves 87.4% through AMR-based extraction into hyperedges—structured knowledge representation that preserves relationships ad-hoc extraction fragments. 100% on personalization tasks.

All three are straightforward to integrate:

Try Hypabase →

LongMemEval scores: RetainDB (79%) self-reported. Letta has not published LongMemEval scores. Hypabase (87.4%) from published benchmark harness.