AI agents are smart and forgetful.
After 700 coding sessions, Tobi Lütke — CEO of Shopify, a $100B company — got fed up. Every session started from scratch. Context from yesterday's debugging? Gone. Architectural decisions from last week? Vanished. The agent kept rediscovering the same things.
So he built QMD — Query Markdown Documents. An on-device search engine that gives AI agents persistent memory across sessions. 21,000+ stars on GitHub. Entirely local. No cloud. No API calls.
This is the infrastructure layer that makes AI-first workflows actually work.
The Core Insight: Search as Memory
Most AI memory solutions send your data to the cloud. QMD flips this. Everything runs on your machine using local GGUF models via node-llama-cpp.
Three search modes work together:
- BM25 lexical search — fast keyword matching, handles exact terms
- Vector semantic search — understands meaning, finds conceptually similar content
- Hybrid mode — runs both in parallel, fuses results with LLM re-ranking, hits 89% accuracy
The insight: you don't need a massive cloud infrastructure. A well-tuned local system with the right architecture outperforms naive cloud embeddings.
The Killer Feature: Hierarchical Context
This is what makes QMD actually useful for agents, not just humans.
When you index a folder, you attach context at any level:
qmd collection add ~/work/api-service --name api
qmd context add api "API service for customer-facing product. Handles auth, rate limiting, billing integration."
qmd context add api/src/auth "Authentication module. Uses JWT with refresh tokens. Redis session store."
When a search matches a document in api/src/auth/, the agent doesn't just get the file. It gets all parent context attached. The why travels with the what.
This is hierarchical memory. The agent understands not just what a file contains, but where it sits in the larger system.
Architecture: Three Components
1. Collections — Directories of markdown files. Your notes, docs, meeting transcripts, session logs. QMD chunks them into searchable units while preserving document structure.
2. Embeddings — Vector representations generated locally. QMD uses a LoRA-tuned Qwen 1.7B model for query expansion before embedding. Your queries get semantically enriched before search.
3. MCP Server — QMD exposes itself as an MCP tool. Claude Code (or any MCP-compatible agent) can search your knowledge base mid-session. The agent calls qmd_search, gets results with context, continues working.
All of it runs in under a second on a MacBook.
Why Tobi Built This: The AI-First Mandate
Tobi wrote an internal memo at Shopify that made waves: "AI use is no longer optional. Before requesting headcount, teams must demonstrate why AI cannot do the job."
Strong stance. But without persistent memory, AI tools hit a ceiling. They're brilliant in the moment, useless across sessions. You can't build an AI-first culture on tools that forget everything overnight.
QMD is the missing infrastructure. Not better models. Better memory.
The fact that the CEO of a $100B company is personally shipping developer tools tells you where this is heading. Memory isn't a nice-to-have. It's an integral element of any serious AI workflow.
What QMD Gets Right
Local-first. Your proprietary code never leaves your machine. For a CEO handling Shopify's codebase, this isn't optional.
Hierarchical context. Not just search results — search results with provenance. The agent knows where knowledge comes from and what system it belongs to.
Session persistence. Index your Claude Code session logs. Past conversations become searchable knowledge. The agent can find what you discussed last Tuesday.
Hybrid retrieval. Keyword search catches exact terms. Semantic search catches concepts. Re-ranking picks the best. Three layers are better than one.
What It Doesn't Solve
QMD is retrieval infrastructure. Excellent retrieval infrastructure. But retrieval is only part of the memory problem.
Temporal tracking. QMD doesn't know that a document from March contradicts one from April. It returns both. You sort it out.
Consolidation. If your notes contain conflicting information, QMD surfaces the conflict but doesn't resolve it.
Decay. Everything stays indexed forever. No mechanism for graceful forgetting. Eventually, old noise drowns new signal.
Verification. QMD tells you where results came from. But it doesn't verify whether those sources are still accurate.
These aren't criticisms — they're boundaries. QMD does retrieval exceptionally well. The other problems require different solutions.
What This Means for AI Agent Memory
QMD validates a principle: memory must be local, fast, and contextual.
Cloud latency kills agent flow. Privacy concerns block enterprise adoption. And flat search results without context force agents to re-derive relationships every time.
QMD solves these. But retrieval-first memory has inherent limits. You're still searching through raw documents. The synthesis happens at query time.
The six problems any serious memory system must address:
- Relationship — QMD's hierarchical context handles this elegantly
- Temporal — unaddressed
- Consolidation — unaddressed
- Decay — unaddressed
- Abstention — unaddressed
- Verification — unaddressed (source tracing exists, but no fact verification)
QMD represents a novel approach gaining serious traction in the developer community. It proves that on-device memory works. What it doesn't attempt is the harder problem: turning retrieved documents into structured, temporal, self-maintaining knowledge.
We're building Hypabase to close those gaps — memory that consolidates automatically, tracks time, decays gracefully, and knows its own boundaries.
Key Takeaways
-
Local-first wins. Privacy, latency, control. QMD proves you don't need the cloud for excellent retrieval.
-
Hierarchical context matters. Search results without context force re-derivation. Context that travels with results compounds.
-
Hybrid search beats single-mode. BM25 + vectors + re-ranking. Three layers outperform any single approach.
-
Session logs are knowledge. Index your past conversations. What you discussed yesterday should inform what you do today.
-
Memory is infrastructure. Not a feature. The foundation that makes AI-first workflows possible.
QMD isn't a memory system. It's a retrieval system. And it's quickly becoming a go-to solution for developers who want local-first search. For anyone serious about AI-assisted development, it's now an integral part of the stack.
Try Hypabase Memory — agentic memory that goes beyond retrieval.
Related: Karpathy's LLM Wiki: Why the Future of AI Memory Isn't RAG | Garry Tan's GBrain: The Memex We Were Promised | Why You Need a Context Layer for Your AI Agent