I Overengineered an AI Agent Memory System, Then Threw It Away for Markdown

Share
I Overengineered an AI Agent Memory System, Then Threw It Away for Markdown
Photo by Fredy Jacob / Unsplash

I spent two months building a long-term memory system for my AI agents. Vector store, embeddings, LLM-driven compression, temporal decay, MCP exposure. It was technically finished, documented, running smoothly. I never used it once.

On 2026-05-08, I archived the whole thing and replaced it with a Git repo of Markdown files. That repo has been in daily use ever since.

What I built (and never used)

ENGRAM v1 was a memory layer for AI agents: SQLite for relational storage, Qdrant for vector search, FastEmbed for embeddings, an LLM compression pass over old fragments, a temporal decay mechanism, and an MCP interface so any agent could query it.

The design came from three places: a livestream of competent devs building agent memory on pgvector, the broader RAG patterns I wanted to learn, and the assumption that “real” agent memory needs semantic retrieval. Two months of work, clean architecture, zero usage.

Why it became an orphan

Three weeks after finishing v1, my Claude Code sessions were still running without it. My personal projects didn’t reach for it. I told myself it just needed more polish, better connectors, one more iteration. The classic script of a project that doesn’t solve a felt problem.

Then Anthropic shipped two things in a single week that crystallized the issue:

  • A persistent memory feature for Claude Managed Agents, built on a filesystem design where memories are files the model reads, writes, and greps with standard tools.
  • Auto Dream, a memory consolidation system whose target structure is a MEMORY.md index plus thematic Markdown files.

Two releases, same direction, same week. Meanwhile my vector store sat idle.

The pivot to Markdown

ENGRAM v2 is a Git repo of structured Markdown files plus a tiny Claude Code skill. No Qdrant, no FastEmbed, no compression, no decay, no service to maintain. The repo does its own work: human-readable, versioned by Git, and manipulable by any LLM through the file commands it already knows (read, grep, write).

What changed overnight: I actually use it. Not out of discipline. Because writing a Markdown note is faster than re-explaining the same context to three different sessions.

That’s the real test for any personal system. With no external pressure, do you reach for it? If yes, the design is right regardless of its sophistication. If no, the design isn’t solving a real problem regardless of its sophistication.

“But vector stores are good, right?”

Yes. They’re the right answer for semantic search over massive heterogeneous corpora, RAG on millions of pages, similarity over non-textual data. My mistake wasn’t choosing a vector store, it was choosing a complexity that exceeded my actual problem. For a few hundred personal Markdown files consulted by me and a few agents, plain text grep is more than enough.

The door stays open. If volume one day outgrows text search, or if I need semantic retrieval over a corpus I can’t hand-index, I’ll add a vector layer, but starting from a real observed need.

Why not just use Anthropic’s memory feature?

Their design is what I converged on, and it’s good. But it’s coupled to Claude and hosted by Anthropic. I want my own memory layer to stay LLM-agnostic and self-hostable, so I can swap models freely in the future (other Claude versions, but also other providers if context, cost, or compliance ever demand it). A Markdown repo plus a small skill is the most portable substrate I can think of. Any model that can read and write text can use it.

What I take away

Three patterns I’ll keep:

  1. Personal R&D without dogfooding stays a theoretical asset. If after a few weeks I’m not reaching for the thing I built, it doesn’t solve a felt problem. Change the design, or honestly call it a learning project and move on.
  2. Radical simplicity is a value test. A system with five components has a high adoption cost. A system that fits in one repo plus one skill enters use immediately. For personal tooling, the second wins almost every time.
  3. Overengineering often comes from technical discomfort dressed up as ambition. When I catch myself building “the perfect solution” on a stack I don’t master, I’m usually conflating curiosity with a product need. Both are legitimate, just not in the same project.

ENGRAM v2 has been live since a week. I’ll know in a few weeks whether the pivot held. If I’m using it several times a week with a healthy set of files, the diagnosis was right. If not, I owe myself another honest post-mortem.