Harvard Researchers Uncover Key Role of External Feedback in Agent Memory Management
Recent advancements in large language models have enabled the development of numerous interactive intelligent agents, which are showing impressive potential in areas such as code generation, autonomous driving, and personal assistance. These agents often require memory capabilities similar to those of humans to continuously learn and improve their performance. Specifically, they need to record and recall past task inputs and corresponding outputs to perform well on new tasks. However, current memory modules in agents are typically tailored for specific tasks. For example, autonomous driving agents store vehicle trajectory and state data, code generation agents save code snippets, and personal assistant agents focus on summarizing conversation history. This task-specific and fragmented design makes it challenging to study general principles and commonalities in memory management. To address this issue, Alex Xidi Xiong, a Harvard University doctoral student and a former undergraduate at the University of Illinois Urbana-Champaign, and his research team decided to focus on the most fundamental memory management operations: addition and deletion. Their goal was to provide systematic experimental evidence and universal principles to avoid the reliance on ad-hoc strategies by developers, which can lead to unstable or even deteriorating performance. Unlike previous studies that focused on complex memory mechanisms, this research centered on the basic operations of adding and removing memories, exploring how different levels of external feedback influence these processes. Through their investigation, the team identified three key patterns in memory management: the experience-following phenomenon, error propagation, and misaligned memory replay. The experience-following phenomenon refers to an agent's tendency to replicate outputs from similar past tasks, regardless of the quality of those memories, leading to consistent but potentially suboptimal behavior. The error propagation effect occurs when low-quality or incorrect outputs are stored in memory due to imprecise feedback, which then get imitated in future tasks, causing a cascade of errors and long-term performance degradation. The third issue, experience replay misalignment, highlights that even correct memories may hinder performance if they are not aligned with the current task context or have become outdated. This suggests the need for precise external feedback to maintain and delete memories effectively. The study underscores an important and often overlooked issue: the critical role of accurate and reliable external feedback in memory management. In many real-world scenarios, the absence of such high-quality feedback can cause memory systems to hinder rather than enhance an agent's long-term performance. The team hopes their work will inspire further research into general mechanisms across different modules of large model agents and provide empirical guidance for future memory system design, ultimately contributing to the development of more intelligent and self-evolving agents. Reflecting on the research process, Xiong told DeepTech that he and his co-corresponding author, Zhen Xiang, now an assistant professor at the University of Georgia, aimed to find a memory management approach that could be applied broadly across different types of agents. Early in their research, they tried many complex methods but found it difficult to develop truly universal solutions due to the vast differences in agent tasks and the ambiguity of defining the research problem. After reflection, they shifted their focus to the basic operations of memory addition and deletion, which are fundamental components of memory systems and have clear implementations across various agents. The team then created a unified experimental framework, selecting agents from multiple domains, including healthcare, autonomous driving, and IoT security, to thoroughly test the generality of their findings. Through extensive experiments and analysis, they identified these three core patterns in agent memory management. Their results highlight the indispensable role of accurate external feedback in maintaining effective memory systems and offer clear directions for future research and applications. However, Xiong acknowledged that the cost of the experiments was significant, particularly when using large models like GPT-4o as the backbone for agents, which required running thousands of tasks and incurred substantial API expenses. This further emphasizes the difficulty and importance of their work. The study, titled “How Memory Management Impacts LLM Agents: An Empirical Study of Experience-Following Behavior,” was published on arXiv, with Xiong as the first author. The team plans to explore ways to minimize the negative impact of memory modules in the absence of high-quality external feedback, aiming to improve long-term agent performance. They believe this research holds significant value for the practical application of large model agents.