Learn AI Engineering with the 6 AI Engineering Patterns:Β Get the one-page summary
Leverage
AI Memory

Sam Whitmore, CEO Of New Computer

How New Computer evolved its AI memory architecture for its conversational journal "Dot" in response to large context windows.

Sam Whitmore, CEO of New Computer, shares the journey of building memory for their conversational AI, Dot. She details their initial attempts, the lessons learned from over-engineering, and how the recent explosion in LLM context window size is forcing them to rethink the fundamental principles of AI memory, moving away from compression towards querying raw data.

Insights

  • Simple fact extraction fails: It strips conversations of their essential nuance and emotional context.
  • Complex schemas create cognitive overhead: Exposing an editable memory database to users can be stressful and counter-productive.
  • Procedural memory enables agency: Triggering memory on situational similarity (e.g., recognizing a planning scenario) allows an AI to learn and reuse conversational workflows.
  • Hybrid search is pragmatic: Combining keyword, semantic, and sparse vector (BM25) search is more effective than committing to a single retrieval method.
  • "Franken-prompts" are robust: Dynamically building prompts from multiple, parallel memory sources makes them a moving target for injection attacks.
  • Large context windows change the game: The need for aggressive context compression diminishes as models become cheaper and can handle 1M+ tokens.
  • Raw data is the source of truth: When possible, query original conversation logs directly rather than relying on intermediate summaries, which are lossy by nature.
  • The future of memory is interpretation: As retrieval becomes a commodity, the most valuable "memory" will be the AI's own analysis and insights about the user.
  • There is no perfect memory architecture: Start with the product's core user experience and continuously re-evaluate your approach as the underlying technology evolves.

Main Ideas

1. Evolution from Structured Data to Raw Logs

  • Early attempts to build memory by extracting discrete facts (user has dog) or creating complex, linked schemas proved brittle and inefficient. The former lost nuance, while the latter created too much engineering and user overhead.
  • The team simplified their approach by stripping down JSON schemas, finding that complex structures did not significantly improve retrieval performance.
  • With large, cheap, and fast context windows, the team is now experimenting with eliminating episodic and entity-level compression entirely, opting for real-time Q&A over raw conversation history.

2. A Four-Part, Parallel Memory System

  • New Computer treats memory as a multi-faceted problem, creating four parallel systems that are queried simultaneously and combined into a final "Franken-prompt."
  • The four systems are:
    • Holistic Theory of Mind: A stable, high-level understanding of the user's core values, goals, and identity. This is always loaded into context.
    • Episodic Memory: Time-based summaries of conversations (e.g., what happened yesterday or last week), triggered by temporal queries.
    • Entity Memory: Records of specific nouns like people, places, and concepts, retrieved using a hybrid search approach.
    • Procedural Memory: Behavioral workflows triggered by situational similarity. It contains the AI's notes on how to act in certain contexts, like "probing for a hidden emotion."

3. Memory as Interpretation, Not Just Retrieval

  • As context windows approach infinity, the technical challenge of simply retrieving a fact from a log will diminish.
  • The value of AI memory will shift from recall to analysis. The most important "memories" will be the AI's own interpretations, insights, and learned behavioral patterns (procedural memory).
  • This reframes the problem: instead of building a perfect database, the goal is to create a system that generates and stores its own meta-cognitive notes about the user and their interactions.

Transcript

00:00:12 Sam Whitmore: Thank you, Nicole. And thank you, Harrison and Langchain and Greg for organizing and hosting. Actually, one of the first things I did with memory was with Harrison on the original memory implantation in Langchain. So very full circle. Cool. So for those of you who do not know New Computer and what we do, we have Dot, which is a conversational journal. It's in the App Store. You can use it now. We launched this last year. So we've been working on memory in AI applications since 2023. Cool. So take us back to 2023. The time GPT four state of the art, we have 8,000 length token, prompt, very slow and very expensive.

00:00:56 Sam Whitmore: So I wanna walk you through some of the things that we tried initially, lessons we learned along the way and how we kind of evolve as underlying technology evolves. So when we started, our general goal was to build a personal AI that got to know you. It was pretty unstructured. And so we knew that if it was going to learn about you as you used it, it needed memory. So we're like, okay, Let's just build the first build the perfect memory architecture and then the product after that. So we started out being like, okay. Maybe we can just extract facts as a user talks to Dot and search across them, know, use some different techniques and we'll have great memory performance.

00:01:42 Sam Whitmore: So we learned pretty quickly that this wasn't really gonna work for us. So imagine a user saying, I have a dog. His name is Poppy. Walking him is the best part of my day. So early extraction, we'd get things like user has dog. User's dog is named Poppy. User likes taking Poppy for walks. There's a lot of nuance missing. So like you can tell a lot about a person from reading that sentence that you can't tell from those facts. That was pretty quick realization for us. We then moved on. So we were like, maybe if we try to summarize everything about Poppy in one place, then it's going to perform better. We decided that we're going to make this universal memory architecture with entities and schemas that were linked to each other.

00:02:30 Sam Whitmore: This was a UI representation of it. So users could actually browse the things that were created and they had different types and on the back end there was different form factors with JSON blobs. This is real example from our product at the time. So I sent it a bachelorette flyer and it made like a whole bunch of different memory types with schemas associated. So you can see here that like this is what the back end data look like. There's different fields and we had a router architecture that would kind of generate queries that would search across all of these, in parallel. And what we found was that it worked okay, but there was kind of some base functionality that was still missing.

00:03:15 Sam Whitmore: Oh, this was a funny example. Jason, my co founder, was sending it pictures and it made him a drunk text category as a schema, which we're like, that feels like a heavy read. But anyway, so the schemas were kind of fun. But yes. So basically, we also saw that when we exposed this to users, there was like too much cognitive overhead for them to garden their their database. Like, there's a lot of overlapping concepts and people got stressed by actually just monitoring their memory base. So again, we're like, okay. Let's just go back to basics here and figure out like, what do we want our product to be doing?

00:03:56 Sam Whitmore: And let's reexamine how we wanna build memory from that. So we looked again at like what a thought partner should have to do to actually be really good as a listener for you. So we realized like, it should always know who you are and your core values. It should know basically like, you know, what you talked about yesterday, what you talked about last week. And again, like, who Poppy is, if Poppy is your dog, who your cofounder is, stuff like that. And it also needs to know about like your behavior preferences and how it should adapt to you as you use it. So we ended up making four kind of parallel memory systems.

00:04:31 Sam Whitmore: So the schemas that you saw didn't really go away, they just became one of the memory systems, the entities. And it's funny seeing Will kind of say some of the same ones. So it's like an example of convergent evolution because we kind of made these up ourselves. But basically like holistic theory of mind, here's mine. It's kind of just like who am I? What's important to me? What am I working on? What's top of mind for me now? Episodic memory is kind of like what happened on a specific day. Here's kind of like actual real examples soon after I had my baby last year. Here's like another entity example. We ended up stripping away a lot of the JSON because it turned out to actually not improve performance in retrieval across the entity schema.

00:05:17 Sam Whitmore: So we kept things like the categories if we wanted to do tag filtering, but I am a lot of the extra structure just ended up being like way too much overhead for the model to output. And finally, we did this thing called procedural memory, which is basically like triggered by, conversational and situational similarity. So what you're looking at here is this intent, and if you're a dot user, you'll probably recognize this behavior. It says choose this if you have sensed a hidden or implied emotion or motivation that the user is not expressing and see a chance to share an insight or probe the user deeper on this matter.

00:05:52 Sam Whitmore: And then what it detects that this is happening, it says like, share an insight, you know, ask a question, issue a statement that encourages the behavior. And so basically like, the trigger here is not semantic similarity, but situational similarity. I see a lot of overlap here for people building agents where if you have a workflow that the agent needs to perform, it can identify that it they encountered that situation before and kind of pull up some learning it had from the past running of the workflow. So this is kind of our way our retrieval pipeline worked in 2024, which is like parallelized retrieval across all of these systems.

00:06:32 Sam Whitmore: So if here's a query which is very hard to read, so maybe these slides will be accessible separately. What restaurant should I take my brother to for his birthday? And in this sense, in each of our four systems, detect if a query is necessary across the system. For holistic stuff, we always load the load the whole theory of mind. Episodic is only triggered if it's like, what did we talk about last week or what did we talk about yesterday? And then here, there's two like different types of entity queries detected like brother and restaurants. And then we would do kind of a hybrid search thing where like we mix together b m 25, semantic, keyword, basically like no attachment to any particular approach, just like whatever improved recall for specific entities.

00:07:19 Sam Whitmore: And then the procedural memory, here if there's a behavioral module loading like restaurant selection or planning, then that would get loaded into the final prompt. So funny thing also is when we launched, people tried to prompt inject us, but because we have so many different behavioral modules and different things going on, we called it like Franken prompt. And like if people did prompt injectors, they'd be like, wait, I think this prompt changes every time, which it did. Okay. So for the formation for these, again, really distinct per system. So holistic theory of mind, you don't need to update that frequently.

00:07:56 Sam Whitmore: Episodic is like periodic summarization. So like, if you wanna have it be per week, you might update across daily summaries once per week, per day, once per day, etcetera. Entities we did per line of conversation, and then we would run kind of cron jobs that we called dream sequences where they'd identify possible duplicates and potentially merge them. And procedural memory also updated per line of conversation. So along with the past year, our product trajectory has changed. We're now building dots, which is a hive mind. So it's like, instead of remembering just one person that it meets, it actually remembers an entire group of people.

00:08:41 Sam Whitmore: And yeah. So it's like many dots, stores the relationships between everyone. Yeah. So you it basically, some of the added challenges we're dealing with now are representing, different people's opinion of each other, how they're connected, and how information should flow between them, in addition to understanding all of the systems I just mentioned above. So one other thing I'll share that has evolved in terms of how like, the world has changed a lot since 2023. So we keep reevaluating how we should be building things constantly. And now we have a million token input context window. We have prompts that are really cheap and they're also really really fast.

00:09:27 Sam Whitmore: So some of the things that we held true in terms of compressing knowledge and context, we no longer hold true. Here's an example. So if you look back at this pipeline I shared before, here's an updated version that we're experimenting with now, which is getting rid of episodic and entity level compression in favor of real time q and a. So that means that like, depending on your system, maybe you don't need to be compressing context at all. Because again, like I said at the beginning, the raw data is always the best source of truth. So it's like, why would you create a secondary artifact as a stepping point between you and what the user's asking?

00:10:10 Sam Whitmore: Ideally, you just wanna examine the context. And so we do that pretty frequently depending on how much data we're dealing with. We try basically not to do to do the minimal amount of engineering possible. And our theory kind of going forward is like, this trend will only continue. So we think the procedural memory and like basically the insights, the interpretation and analysis that the thing does is the important part of memory. It's like the record of its thoughts about you and kind of its notes to itself is the important part. You can almost separate that from retrieval as a problem. You can say like, okay, maybe there'll be an infinite log of like my interactions and model notes will be interpolated in in the in the future.

00:10:56 Sam Whitmore: And so maybe we don't even have to deal with retrieval and context compression at all. So I guess, if I want you guys to take away one thing, it's like the perfect memory architecture doesn't exist. And start with kind of what your product is supposed to do. And then think from first principles about how to make it work. And do that all the time because the world is changing and you might not need to invest that much in memory infrastructure. That's it. So you you can follow us at Twitter, new computer. Thank you.

On this page