Will Fu-Hinthorn, Founding Engineer @ LangChain

Will Fu-Hinthorn on why one-size-fits-all memory fails and how to build context-aware systems that learn.

Will Fu-Hinthorn, an engineer at LangChain, shares lessons from building memory systems for LLM agents. He explains that early attempts to create a general-purpose memory server revealed that effective memory is not a monolithic component but a nuanced, application-specific software system. The key is to start with the problem you're solving and work backward, focusing on customization and experimentation.

Insights

General-purpose memory is a myth: Effective memory systems must be tailored to the specific goals and context of your application.
Memory is software, not hardware: Start by defining your application's needs, not by choosing a specific database technology.
Updates are harder than extraction: The real challenge is not just pulling out facts, but integrating them into a consistent knowledge base without introducing errors.
Context is king: Memory should be treated as one component within a larger system that includes all relevant data sources, not as a standalone feature.
An experimental approach is crucial: The best way to build a successful memory system is to look at your data, test different approaches, and iterate.
Different jobs require different memory types: An agent recalling facts (semantic memory) uses a different system than one learning a multi-step procedure (procedural memory).
File systems as a memory backend: LangChain is exploring a novel approach using a file system-like API for memory, hypothesizing that models heavily optimized for software engineering will excel at managing it.
Take ownership of your memory system: Avoid black-box solutions and instead build and control the logic and architecture yourself for the most flexible and powerful results.

The Big 3

1. The Flaw of Generic Memory

Will emphasizes that a "one-size-fits-all" memory solution is impractical because different applications require different kinds of memory. For instance, ChatGPT's memory is built for broad user preferences and conversation summaries, whereas a specialized email assistant needs to remember scheduling constraints, stylistic preferences, and specific procedures. LangChain's initial, more rigid "memory server" was not flexible enough for these varied, real-world use cases, proving that memory must be treated as software tailored to a specific job.

Diverse Requirements: Applications need to remember different things, including facts, relationships, episodic experiences, and procedures.
Application-Specific Design: An email assistant's memory needs (e.g., scheduling, security protocols) are fundamentally different from a general chatbot's.
Software, Not Hardware: The choice of architecture should follow the application's function, not be dictated by a pre-selected database like a graph DB.

2. Memory as an Integrated Context System

Memory doesn't operate in a vacuum. It's one piece of a broader context system that includes everything an agent needs to know, such as a codebase, recent events, or other user data. A key challenge is ensuring that new information is synthesized correctly into the agent's existing "world model" without causing contradictions or errors. Isolating memory in a separate service, as LangChain first tried, makes it difficult to create a coherent and holistic view for the agent.

Holistic Worldview: Memory must be integrated with all other data and predictions available to the agent.
Error-Prone Updates: Synthesizing new knowledge and connecting it to an existing domain model is significantly harder than simply extracting it.
Avoiding Silos: When memory is siloed, it's difficult for an agent to form a consistent understanding of all the information it engages with.

3. A Flexible, Developer-Centric Toolkit

Based on these lessons, LangChain developed an SDK designed for ultimate flexibility. It empowers developers to own their memory architecture by providing primitives for different memory types while allowing them to define the extraction and update logic in their own code. This approach supports experimentation with different storage backends and processing methods, ensuring the memory system can evolve with the application and the underlying LLM technology.

Customization is Key: The SDK allows developers to define how information is extracted, reconciled, and updated, providing primitives for facts, instructions, and episodes.
Flexible Storage: The system is backend-agnostic, supporting any storage solution with basic get, put, and search methods, including an experimental file system approach.
Flexible Processing: Developers can choose how to process interactions—in real-time, in batches, or deferred—to best suit the application's latency and cost requirements.

Transcript

00:00:14 Will Fu-Hinthorn: Thank you. Thank you. Thank you Nicole, and thank you for Greg for organizing this, and Matt, and thank you all for being here. It's kinda crazy to see how much excitement there is for this ambiguous and and broad topics we call memory. So I'm Will, engineer at LangChain. One of the honors that I get at LangChain is to work with a lot of our customers and users in building successful memory and context systems. And so this set of slides is just a few takeaways, a few lessons that I've learned or a few lessons that I've had reinforced over the past year about sort of what it takes in order to build a successful memory system.

00:00:52 Will Fu-Hinthorn: And if you take nothing else from this, it's that it's really hard to make a general purpose one. They're best if they're focused on your specific application and to always look at your data and have an experimental approach. So first, we're gonna roll it back. A little over a year ago, we launched our first memory server. It was a server designed to take in interactions between a user and an agent or a chatbot and it would extract derived insights about users that you could then later use. We launched LangFriend, which is this journaling app, a demo in order to show how to use it. And we had a bunch of enterprise design partners.

00:01:29 Will Fu-Hinthorn: Over the subsequent months, we worked really closely with these teams and learned that, you know, first of all memory is very hard and and had a bunch of lessons that I'm gonna try to share to you today that caused us to sort of rethink how we wanted to share a lot of these tooling and all of these approaches for memory. The first being that there's no really real one size fits all memory solution. And if you think about it for more than a few seconds, this this kinda makes sense. You know, memory that we call touches on a lot of downstream jobs to be done or skills. We need to be remembering facts, information, relationships, this deeply interconnected web of information that we need, to know.

00:02:12 Will Fu-Hinthorn: And an agent also expects to know all this information in order to perform well. But that's not all. It needs to remember all of that situated in a temporal context. It needs to have more like episodic memory in order to remember these experiences and it also needs to be understanding and able to learn, you know, procedures and new information in order to execute on particular tasks. Not all of these memory types necessary for every application. This is also non exhaustive, but these are some of the different types of ways, that we see people wanting to be learning new types of things in their application.

00:02:44 Will Fu-Hinthorn: And another reinforcement of this is you can see the different types of things that people wanna be learning in real applications. So Nicole mentioned ChatGPT launched their updated memory implementation recently and they really catered this around their particular UX and use case. They have assistant response preferences so it can try to learn sort of your preferred way learning, of responding to you in general. It has some topics that it's trying to model out. It's got useful insights about the user that may or may not be useful depending on the context. It has summaries of recent conversations to sort of extend that conversational buffer into a sort of compressed representation so it has more temporal context there.

00:03:24 Will Fu-Hinthorn: And then has a bunch of random interaction metadata that it considers to be relevant depending on the use case. Contrast that with something like an email assistant where you really wanna be focused on a particular task. You wanna be focusing on the style and content that you're gonna be putting into it. Whenever it's scheduling, you really need to know what you prefer, what you don't, where you're available, all that kind of stuff. You need to be having a deep understanding of different sort of procedures if someone is emailing you and reporting a security event, and you need to follow all of that versus, something where it's just trying to meet up for coffee.

00:03:56 Will Fu-Hinthorn: All of these things you can try to program ahead of time, but if you want your agent to be able to learn this, you might not get too far if you're just reaching her off the shelf system. The second lesson is that, you know, updates and validation are especially error prone. You know, it's one thing to extract what you think is particular interesting about a bit of a conversation or a document, you know, LMs are good at summarizes. It's even harder to then synthesize that and connect that to all the existing knowledge and domain that you have in order to create a consistent world model that improves your prediction downstream.

00:04:28 Will Fu-Hinthorn: And so a lot of this ends up being application specific. This reinforces our lesson too. We released a couple of tools for this. There's things like TrustCall to help with updates so your LLMs aren't willy nilly deleting things. We've released some other tooling and example around it. But at end the day, a lot of this is about how you're presenting and mapping your memory context to the information architecture or the needs of your application. Lesson three is that it's part of a broader context system. Memory isn't just this one isolated thing where you're learning about the user preferences.

00:04:58 Will Fu-Hinthorn: This is a part of, you know, the situation in which the agent finds itself. Be that your code base, be that the, you know, recent events the user's gone through, and other sorts of things. And when we had originally built our memory server, we had really focused and leaned in on to what we could differentiate. But a lot of the people we worked with wanted to then integrate this with all the other existing predictions, preferences, and other data that they have for their model. And while you can synthesize this all at prediction time or retrieval time, it's hard to create a sort of coherent world view over everything user engages with or everything the agent engages with if you're not treating this more holistically.

00:05:36 Will Fu-Hinthorn: Lesson four, and this is one thing that we already sort of believe, but we especially believe now is that memory is a software, not necessarily hardware. We often have people come to us and talk about, oh, I need a graph DB or I need a very particular instantiation of memory for this. When really it's backwards, you need to start from what you're trying to solve. And we sort of made this mistake in that we had a a very particular memory server with a particular instance, we got great feedback on that. We found that wasn't sufficiently flexible for people depending on the deployment context. So all these insights drove our creation of Lingmann SDK and we organized that around three main principles.

00:06:14 Will Fu-Hinthorn: One is we wanted to support easy customization and experimentation. All of the actual extraction runs in the code that you define. We have the number of primitives to try to address some of these memory types that we've talked about before, but we really want you to be taking ownership over the type of information how it's updated. We wanna support flexible storage and organization of memories so you can sort of define what makes sense for you and we aren't gonna lock you into a particular like vector database or anything like that, you can swap it out. And we wanted flexible processing as well. A lot of people talk about different sort of, you know, batch versus online, all these types of different ingestion methods.

00:06:46 Will Fu-Hinthorn: We wanna make sure that we weren't really only speaking to one particular thing in this SDK. So to go a little bit deeper on the first topic of customization experimentation, we have some primitives that we included in the library or the memory types that we found that people often lean towards. So one is this learning of knowledge and facts. We have this background memory manager where you can really customize in terms of instructions, the steps that are taken, how to reconcile information, and all this is sort of orchestrated in your own code. Another way that we have it that's not shown here is we just have some simple tools and you can be defining it with an agent.

00:07:20 Will Fu-Hinthorn: So And as models get better, if you wanna treat this as purely reasoning over things, you can be putting that. We have learning instructions or code books or however you'd like to call that, complex workflows. This is very similar to prompt optimization. It's typically not the whole prompt, but you have different sections of the prompt where you're gonna have instructions that are custom to the user. And this is especially like data driven where you wanna be going over batches of of conversations, extracting insights from them, looking at explicit user feedback, and then incorporating that all into updates that you can then measure.

00:07:54 Will Fu-Hinthorn: I think if you put all of this into a very generic graph and everything, you kind of lose out on a lot of the ability to elicit the proper responses that you want if you wanna look at things like tone, if you wanna look at learning new capabilities or looking at more complex multi step interactions that you actually, need your agent to be learning. And finally, there's also learning episodes. This is probably the least supported method in in the the library, but we let you define synthetic few shots to be extracted and then you can incorporate it down. The second principle we wanted to support was about flexible storage and organization.

00:08:28 Will Fu-Hinthorn: So we allow you to store it in pretty much any back end where you have a get and put method, and a search method and so we have all these integrations as well. One thing as a tangent that we're kind of excited to test out a little bit more is this file system as a back end or at least exposing a file system like API. And the reason for that is a lot of these coding engine or a lot of these LLMs and these foundation model companies are really optimizing for software engineering. And so one hypothesis that we're testing out, we will get back on more results in a bit is that perhaps there'll be more lending them to managing memories as if it were a file system.

00:09:00 Will Fu-Hinthorn: And so, you know, that that's the flexibility of this, library allows you to be experimenting with all of these things as LLMs tend to lean into different directions over time. You can organize the memories by user agent role, organization, all those types of things. So you're not locked into a particular only organizing things around the user. You can have agents learn things just for themselves and share it across, and this is all sort of orthogonal to the actual process of information processing. And you can process how and when you like. We have some distraction around being able to execute this on a deferred basis so you can be batching all of this later on.

00:09:35 Will Fu-Hinthorn: If you have a really rapidly occurring interaction, you can have it actually processed in real time or you can delay it so that there's de duplication of information, and all this can be managed either through a local executor or online with, Landgraft platforms that can be scaled in a very horizontally scalable way. And that concludes my talk. I guess here's a summary slide of all the things that I wanted to share. Again, you take nothing else, it's that really no one size fits all and you really wanna start with what you hope to accomplish, what you hope your agents learn and then work your way backwards to pick the right solution for you.

00:10:07 Will Fu-Hinthorn: You know, test things out, but don't just be willing to offload all of this to one

Transcript

00:10:07 Will Fu-Hinthorn: You know, test things out, but don't just be willing to offload all of this to one particular solution that claims to be a holy grail. We've put LangmuMa SDK out there, encourage you to experiment on it, encourage you to give us feedback. We're always looking to improve it, but I think yeah. We encourage you to take ownership of that. So thank you.

Will Fu-Hinthorn, Founding Engineer @ LangChain

Insights

The Big 3

1. The Flaw of Generic Memory

2. Memory as an Integrated Context System

3. A Flexible, Developer-Centric Toolkit

Transcript

Transcript

On this page