Zep and Letta represent two fundamentally different philosophies for agent memory. Zep uses a temporal knowledge graph (Graphiti) where time is a first-class dimension, while Letta (formerly MemGPT) gives agents control over their own memory through self-editing tiers.
This comparison covers their architectures, benchmark performance, pricing, and ideal use cases to help you decide.
Quick Comparison
| Factor | Zep | Letta |
|---|
| Architecture | Temporal knowledge graph (Graphiti) | Three-tier self-editing memory |
| LongMemEval* | 71.2% | Not published |
| Deployment | Cloud-first; self-host requires Graphiti + graph DB | Self-hosted (Docker/Python) or Letta Cloud |
| Pricing | Free / $25 / $475 / Enterprise | Open source / Cloud pricing TBD |
| GitHub Stars | 4.4K (Zep) + 24.8K (Graphiti) | 22K |
| Funding | Not disclosed | $10M seed (YC, Jeff Dean invested) |
What is Zep?
Zep uses Graphiti, a temporal knowledge graph where time is a first-class dimension. Every fact has valid_from, valid_to, and invalid_at markers, allowing queries like "what was true in January?" or "when did this change?"
Zep positions itself around "context engineering" rather than just memory. Graphiti, the underlying engine, has 24.8K stars and supports multiple graph backends (Neo4j, FalkorDB, Kuzu, Neptune).
Key strengths:
- Best-in-class temporal reasoning
- Multi-hop graph queries
- <200ms retrieval latency
- Graphiti is open source (24.8K stars)
- Strong enterprise features (SOC2, HIPAA)
What is Letta?
Letta (formerly MemGPT) pioneered the concept of agents that manage their own memory through function calls. Born from UC Berkeley research and backed by $10M in seed funding (with investors including Jeff Dean), the agent decides what's worth remembering and can edit its own memory tiers.
Letta organizes memory into three tiers: Core memory (always in the context window), Recall memory (searchable conversation cache), and Archival memory (long-term storage). The agent decides what moves between tiers.
Key strengths:
- Research-backed approach (UC Berkeley)
- Agent autonomy in memory management
- Active benchmark publishing (Letta Leaderboard)
- Strong coding agent focus (Letta Code)
- Well-funded with notable investors
Architecture Comparison
Zep's Approach
Zep's Graphiti engine stores facts as nodes in a knowledge graph with explicit temporal metadata. Each edge carries validity windows tracking when facts became true and when they were superseded.
This temporal awareness is native to the architecture—not a filter applied after retrieval. The tradeoff is infrastructure complexity: self-hosting requires running Graphiti plus a graph database.
Letta's Approach
Letta gives the agent itself control over memory management through function calls. The agent decides what to store, where to store it, and when to move information between tiers. Core memory stays in the context window, Recall memory is searchable, and Archival memory holds long-term facts.
This creates autonomous memory management but introduces less predictable behavior—the agent's memory decisions depend on the underlying model's judgment.
The Key Difference
Zep structures memory around time; Letta structures memory around agent autonomy.
When a user changes jobs, Zep's temporal edges explicitly mark the old job as superseded and the new one as current. The system handles the update deterministically.
In Letta, the agent must decide to update its own Core memory with the new job and archive the old one. Whether this happens correctly depends on the agent's judgment—a more flexible but less predictable approach.
This means Zep is more reliable for factual updates, while Letta offers more flexibility for agents that need to decide what matters.
| Benchmark | Zep | Letta |
|---|
| LongMemEval* | 71.2% | Not published |
Zep self-reports 71.2% on LongMemEval, though independent evaluation shows 63.8%. Letta has not published LongMemEval scores, instead maintaining its own benchmark suite (Letta Leaderboard).
The lack of a common benchmark makes direct comparison difficult. Without published scores, it's unclear how Letta's self-editing approach performs on standardized retrieval tasks.
Both score significantly below Hypabase (87.4%), which uses AMR-based extraction for higher retrieval accuracy.
Pricing Comparison
Zep
| Tier | Price | Limits |
|---|
| Free | $0 | 1K episodes/month |
| Flex | $25/month | 20K credits, 600 req/min |
| Flex Plus | $475/month | 300K credits, 1K req/min, webhooks |
| Enterprise | Custom | SOC2, HIPAA, dedicated support |
Letta
| Tier | Price | Details |
|---|
| Open Source | Free | Self-hosted, full features |
| Letta Cloud | TBD | Managed hosting, pricing not finalized |
Letta's open-source model means you can run the full system for free if you self-host. Zep's credit-based model provides managed infrastructure but costs scale with usage. Letta Cloud pricing is still TBD, making long-term cost comparison difficult.
For self-hosting: Letta is simpler (Docker or Python package). Zep requires running Graphiti with a graph database (Neo4j/FalkorDB/Kuzu), adding operational complexity.
When to Choose Zep
Choose Zep if you:
- Need temporal queries ("what was true last month?")
- Require enterprise compliance (SOC2, HIPAA)
Zep's temporal graph is useful for knowledge updates, though the architecture hasn't evolved much since launch.
When to Choose Letta
Choose Letta if you:
- Want agents that manage their own memory autonomously
- Need a research-backed approach with active development
Letta's self-editing model is novel but hasn't published standardized benchmark scores, making accuracy claims hard to verify.
Consider Hypabase
Both Zep and Letta struggle with a core challenge: reliably capturing what actually changed. Zep's graph edges depend on LLM-prompted extraction that can miss or misparse updates. Letta's self-editing agents must decide what's worth storing—and that decision varies with model quality. Hypabase sidesteps both failure modes with a formal extraction framework.
| Factor | Zep | Letta | Hypabase |
|---|
| Extraction | LLM-based into triples | Agent-controlled | AMR (formal linguistic framework) |
| Representation | Temporal triples | Three-tier storage | N-ary hyperedges |
| LongMemEval* | 71.2% | Not published | 87.4% |
| Personalization | — | — | 100% |
Hypabase uses Abstract Meaning Representation (AMR)—a formal framework from computational linguistics—to extract facts into structured hyperedges. Unlike ad-hoc LLM prompts (Zep) or agent self-editing (Letta), AMR parsing follows defined grammar rules, producing PENMAN notation with karaka semantic roles:
"The project deadline was moved from Friday to next Wednesday"
Ad-hoc extraction (Zep):
(project_deadline, moved_to, Wednesday)
— Friday origin is lost
Agent self-edit (Letta):
Core memory updated: "deadline is Wednesday"
— Depends on agent deciding to update; may not record the change from Friday
AMR extraction (Hypabase):
(moved :object deadline :attribute project :origin Friday :locus Wednesday)
Hypabase captures the full temporal shift in one hyperedge—where the deadline moved from, where it moved to, and what it applies to. Zep's triple loses the origin. Letta's agent may or may not record the change at all.
Why This Matters for Temporal Updates
| Benefit | How AMR + Hyperedges Deliver It |
|---|
| Complete change tracking | :origin and :locus roles capture both old and new values in a single fact |
| No agent judgment required | AMR parsing is deterministic—no reliance on the model choosing to update memory |
| Parseable output | PENMAN notation has defined grammar; malformed extractions caught at parse time |
| No fragmentation | The entire schedule change is one atomic hyperedge, not scattered across triples or tiers |
This is why Hypabase achieves 100% on personalization tasks—every fact is extracted with its full relational structure, regardless of whether an agent "decided" it was important.
Learn more about Hypabase →
FAQ
Is Zep better than Letta?
They solve different problems. Zep (71.2% self-reported) excels at temporal reasoning with structured knowledge graphs. Letta offers agent autonomy but hasn't published comparable benchmark scores. For higher accuracy with structured extraction, consider Hypabase (87.4%).
Can I migrate from Zep to Letta?
There's no direct migration path—they use fundamentally different architectures. Zep stores temporal graph data while Letta uses tiered memory managed by the agent. Migration requires re-ingesting conversation history through the new system.
What's the main difference?
Zep optimizes for temporal accuracy with graph-based reasoning. Letta optimizes for agent autonomy in memory management. Hypabase optimizes for extraction quality using AMR and structured hyperedge representation.
Which is better for self-hosting?
Letta is easier to self-host (Docker or Python package with no graph database required). Zep requires running Graphiti plus a graph database (Neo4j/FalkorDB/Kuzu). Hypabase runs entirely in a single SQLite file with no external database required—the simplest self-hosting option.
Conclusion
Zep adds temporal reasoning and self-reports 71.2% on LongMemEval (independent evaluation shows 63.8%). Useful for knowledge updates, though requires graph database infrastructure.
Letta offers a novel agent-controlled memory model from UC Berkeley research but hasn't published standardized benchmark scores, making performance claims difficult to evaluate.
Hypabase achieves 87.4% through AMR-based extraction into hyperedges—structured knowledge representation that preserves relationships ad-hoc extraction fragments. 100% on personalization tasks.
All three are straightforward to integrate:
Try Hypabase →
LongMemEval scores: Zep (71.2%) self-reported; independent evaluation shows 63.8% (arxiv:2512.13564). Letta has not published LongMemEval scores. Hypabase (87.4%) from published benchmark harness.