Zep vs Letta: Which AI Memory Solution Should You Choose?

Zep and Letta represent two fundamentally different philosophies for agent memory. Zep uses a temporal knowledge graph (Graphiti) where time is a first-class dimension, while Letta (formerly MemGPT) gives agents control over their own memory through self-editing tiers.

This comparison covers their architectures, benchmark performance, pricing, and ideal use cases to help you decide.

Quick Comparison

Factor	Zep	Letta
Architecture	Temporal knowledge graph (Graphiti)	Three-tier self-editing memory
LongMemEval*	71.2%	Not published
Deployment	Cloud-first; self-host requires Graphiti + graph DB	Self-hosted (Docker/Python) or Letta Cloud
Pricing	Free / $25 / $475 / Enterprise	Open source / Cloud pricing TBD
GitHub Stars	4.4K (Zep) + 24.8K (Graphiti)	22K
Funding	Not disclosed	$10M seed (YC, Jeff Dean invested)

What is Zep?

Zep uses Graphiti, a temporal knowledge graph where time is a first-class dimension. Every fact has valid_from, valid_to, and invalid_at markers, allowing queries like "what was true in January?" or "when did this change?"

Zep positions itself around "context engineering" rather than just memory. Graphiti, the underlying engine, has 24.8K stars and supports multiple graph backends (Neo4j, FalkorDB, Kuzu, Neptune).

Key strengths:

Best-in-class temporal reasoning
Multi-hop graph queries
<200ms retrieval latency
Graphiti is open source (24.8K stars)
Strong enterprise features (SOC2, HIPAA)

What is Letta?

Letta (formerly MemGPT) pioneered the concept of agents that manage their own memory through function calls. Born from UC Berkeley research and backed by $10M in seed funding (with investors including Jeff Dean), the agent decides what's worth remembering and can edit its own memory tiers.

Letta organizes memory into three tiers: Core memory (always in the context window), Recall memory (searchable conversation cache), and Archival memory (long-term storage). The agent decides what moves between tiers.

Key strengths:

Research-backed approach (UC Berkeley)
Agent autonomy in memory management
Active benchmark publishing (Letta Leaderboard)
Strong coding agent focus (Letta Code)
Well-funded with notable investors

Architecture Comparison

Zep's Approach

Zep's Graphiti engine stores facts as nodes in a knowledge graph with explicit temporal metadata. Each edge carries validity windows tracking when facts became true and when they were superseded.

This temporal awareness is native to the architecture—not a filter applied after retrieval. The tradeoff is infrastructure complexity: self-hosting requires running Graphiti plus a graph database.

Letta's Approach

Letta gives the agent itself control over memory management through function calls. The agent decides what to store, where to store it, and when to move information between tiers. Core memory stays in the context window, Recall memory is searchable, and Archival memory holds long-term facts.

This creates autonomous memory management but introduces less predictable behavior—the agent's memory decisions depend on the underlying model's judgment.

The Key Difference

Zep structures memory around time; Letta structures memory around agent autonomy.

When a user changes jobs, Zep's temporal edges explicitly mark the old job as superseded and the new one as current. The system handles the update deterministically.

In Letta, the agent must decide to update its own Core memory with the new job and archive the old one. Whether this happens correctly depends on the agent's judgment—a more flexible but less predictable approach.

This means Zep is more reliable for factual updates, while Letta offers more flexibility for agents that need to decide what matters.

Benchmark Performance

Benchmark	Zep	Letta
LongMemEval*	71.2%	Not published

Zep self-reports 71.2% on LongMemEval, though independent evaluation shows 63.8%. Letta has not published LongMemEval scores, instead maintaining its own benchmark suite (Letta Leaderboard).

The lack of a common benchmark makes direct comparison difficult. Without published scores, it's unclear how Letta's self-editing approach performs on standardized retrieval tasks.

Both score significantly below Hypabase (87.4%), which uses AMR-based extraction for higher retrieval accuracy.

Pricing Comparison

Zep

Tier	Price	Limits
Free	$0	1K episodes/month
Flex	$25/month	20K credits, 600 req/min
Flex Plus	$475/month	300K credits, 1K req/min, webhooks
Enterprise	Custom	SOC2, HIPAA, dedicated support

Letta

Tier	Price	Details
Open Source	Free	Self-hosted, full features
Letta Cloud	TBD	Managed hosting, pricing not finalized

Letta's open-source model means you can run the full system for free if you self-host. Zep's credit-based model provides managed infrastructure but costs scale with usage. Letta Cloud pricing is still TBD, making long-term cost comparison difficult.

For self-hosting: Letta is simpler (Docker or Python package). Zep requires running Graphiti with a graph database (Neo4j/FalkorDB/Kuzu), adding operational complexity.

When to Choose Zep

Choose Zep if you:

Need temporal queries ("what was true last month?")
Require enterprise compliance (SOC2, HIPAA)

Zep's temporal graph is useful for knowledge updates, though the architecture hasn't evolved much since launch.

When to Choose Letta

Choose Letta if you:

Want agents that manage their own memory autonomously
Need a research-backed approach with active development

Letta's self-editing model is novel but hasn't published standardized benchmark scores, making accuracy claims hard to verify.

Consider Hypabase

Both Zep and Letta struggle with a core challenge: reliably capturing what actually changed. Zep's graph edges depend on LLM-prompted extraction that can miss or misparse updates. Letta's self-editing agents must decide what's worth storing—and that decision varies with model quality. Hypabase sidesteps both failure modes with a formal extraction framework.

Factor	Zep	Letta	Hypabase
Extraction	LLM-based into triples	Agent-controlled	AMR (formal linguistic framework)
Representation	Temporal triples	Three-tier storage	N-ary hyperedges
LongMemEval*	71.2%	Not published	87.4%
Personalization	—	—	100%

AMR Extraction + Hyperedges

Hypabase uses Abstract Meaning Representation (AMR)—a formal framework from computational linguistics—to extract facts into structured hyperedges. Unlike ad-hoc LLM prompts (Zep) or agent self-editing (Letta), AMR parsing follows defined grammar rules, producing PENMAN notation with karaka semantic roles:

"The project deadline was moved from Friday to next Wednesday"

Ad-hoc extraction (Zep):
  (project_deadline, moved_to, Wednesday)
  — Friday origin is lost

Agent self-edit (Letta):
  Core memory updated: "deadline is Wednesday"
  — Depends on agent deciding to update; may not record the change from Friday

AMR extraction (Hypabase):
  (moved :object deadline :attribute project :origin Friday :locus Wednesday)

Hypabase captures the full temporal shift in one hyperedge—where the deadline moved from, where it moved to, and what it applies to. Zep's triple loses the origin. Letta's agent may or may not record the change at all.

Why This Matters for Temporal Updates

Benefit	How AMR + Hyperedges Deliver It
Complete change tracking	`:origin` and `:locus` roles capture both old and new values in a single fact
No agent judgment required	AMR parsing is deterministic—no reliance on the model choosing to update memory
Parseable output	PENMAN notation has defined grammar; malformed extractions caught at parse time
No fragmentation	The entire schedule change is one atomic hyperedge, not scattered across triples or tiers

This is why Hypabase achieves 100% on personalization tasks—every fact is extracted with its full relational structure, regardless of whether an agent "decided" it was important.

Learn more about Hypabase →

FAQ

Is Zep better than Letta?

They solve different problems. Zep (71.2% self-reported) excels at temporal reasoning with structured knowledge graphs. Letta offers agent autonomy but hasn't published comparable benchmark scores. For higher accuracy with structured extraction, consider Hypabase (87.4%).

Can I migrate from Zep to Letta?

There's no direct migration path—they use fundamentally different architectures. Zep stores temporal graph data while Letta uses tiered memory managed by the agent. Migration requires re-ingesting conversation history through the new system.

What's the main difference?

Zep optimizes for temporal accuracy with graph-based reasoning. Letta optimizes for agent autonomy in memory management. Hypabase optimizes for extraction quality using AMR and structured hyperedge representation.

Which is better for self-hosting?

Letta is easier to self-host (Docker or Python package with no graph database required). Zep requires running Graphiti plus a graph database (Neo4j/FalkorDB/Kuzu). Hypabase runs entirely in a single SQLite file with no external database required—the simplest self-hosting option.

Conclusion

Zep adds temporal reasoning and self-reports 71.2% on LongMemEval (independent evaluation shows 63.8%). Useful for knowledge updates, though requires graph database infrastructure.

Letta offers a novel agent-controlled memory model from UC Berkeley research but hasn't published standardized benchmark scores, making performance claims difficult to evaluate.

Hypabase achieves 87.4% through AMR-based extraction into hyperedges—structured knowledge representation that preserves relationships ad-hoc extraction fragments. 100% on personalization tasks.

All three are straightforward to integrate:

Try Hypabase →

LongMemEval scores: Zep (71.2%) self-reported; independent evaluation shows 63.8% (arxiv:2512.13564). Letta has not published LongMemEval scores. Hypabase (87.4%) from published benchmark harness.