Building Ankoryn: Why I Wanted Persistent Memory Across AI Sessions and How I Built It
Every AI conversation I had felt like starting from scratch. No context, no memory, no continuity. Ankoryn started as a frustration and turned into the most technically interesting thing I have built.
I use AI constantly. For thinking through problems, drafting content, exploring ideas, working through technical decisions. But the thing that kept frustrating me was how every session started from zero.
You explain your context again. You re-establish what you are working on. You remind the model what it said two days ago. If you switch between models, you lose everything. The AI is powerful but it has no memory of you, no continuity, no sense of the broader work you are doing.
That friction compounded over time. I started keeping notes to paste in at the start of sessions. I tried various tools. Nothing felt right. So I built Ankoryn.
What I actually wanted
The core thing I wanted was simple to describe and hard to build. I wanted an AI workspace that remembered what I had told it, could summarise and compress older context so it did not eat the whole context window, and could route between different models depending on what I needed at any given moment.
Not a chat interface with a nicer design. A genuine workspace layer that sat on top of the models and managed memory, context and routing in a structured way.
I also wanted it to feel like a place to work, not just a place to chat. There is a difference. Chat interfaces are disposable. A workspace has persistence, structure, and a sense that the work you do there accumulates into something.
The memory problem
The hardest part of building this was memory management. Language models have a finite context window. If you just keep appending conversation history, you hit the limit fast and older context gets dropped. So the question is how you compress and summarise what has come before without losing the things that actually matter.
I built a summarisation layer that periodically condenses older conversation segments into compressed summaries, storing them in IndexedDB via Dexie.js. When a new session starts, the relevant summaries are retrieved and injected back into the context alongside the recent conversation history. The model gets enough context to feel continuous without the full raw history eating the window.
Getting this right took longer than anything else in the project. The summarisation has to be aggressive enough to save space but careful enough not to throw away context that will matter later. I went through several iterations before it felt reliable.
Cross-model routing
The other thing I wanted was the ability to use different models for different tasks without having to manage multiple interfaces. OpenAI for some things, Gemini for others. The routing layer in Ankoryn lets me switch between them inside the same workspace without losing the thread of what I am working on.
This sounds straightforward but there are edge cases everywhere. Different models handle context differently. Token limits vary. The way you structure a system prompt that works well in one model does not always translate directly to another. A lot of the work was making the routing feel seamless even when the underlying models behave differently.
What I use it for now
Ankoryn is my primary AI interface for serious work. When I am working through a complex technical problem, writing something that needs multiple drafts, or planning a new project, I do it in Ankoryn because I know the context will be there when I come back to it.
It is also where I test a lot of ideas before they become products. The workspace structure means I can keep separate threads for different projects without them bleeding into each other.
What building it taught me
The biggest lesson was about the gap between what AI demos show and what reliable AI-powered products actually need. A demo of persistent memory is easy. A system that manages memory correctly across hundreds of sessions, different models, varying context lengths and real user behaviour is a different problem entirely.
I also learned a lot about context engineering, which I think is going to become a recognised discipline in the same way prompt engineering has. How you structure the information you give a model, what you include, what you compress, what you leave out, has a massive impact on output quality. Building Ankoryn forced me to think about this at a level I had not before.
If you are building anything AI-powered and you are not thinking carefully about context management, that is probably where your quality problems are coming from.
