HyperAIHyperAI

Command Palette

Search for a command to run...

Gedächtnis im Zeitalter von AI-Agenten

Zusammenfassung

Speicher ist hervorgetreten und wird auch weiterhin eine zentrale Fähigkeit von auf Grundmodelle basierenden Agenten darstellen. Während die Forschung zu Agentenspeicher rasant wächst und unerhörte Aufmerksamkeit erlangt, ist das Feld zudem zunehmend fragmentiert. Bestehende Arbeiten, die unter dem Dach des Agentenspeichers fallen, unterscheiden sich oft erheblich hinsichtlich ihrer Motivationen, Implementierungen und Evaluationsprotokolle, während die Verbreitung von lose definierten Speicherbegriffen die begriffliche Klarheit weiter verschleiert hat. Traditionelle Taxonomien wie Langzeit-/Kurzzeitgedächtnis erweisen sich als unzureichend, um die Vielfalt moderner Agentenspeichersysteme adäquat abzubilden. Diese Arbeit zielt darauf ab, einen aktuellem Überblick über die gegenwärtige Forschung zu Agentenspeicher zu geben. Zunächst klären wir den Geltungsbereich des Agentenspeichers präzise und unterscheiden ihn von verwandten Konzepten wie LLM-Speicher, Retrieval-Augmented Generation (RAG) und Context Engineering. Anschließend betrachten wir den Agentenspeicher durch die einheitlichen Perspektiven von Formen, Funktionen und Dynamik. Aus der Sicht der Formen identifizieren wir drei dominierende Realisierungsformen des Agentenspeichers: tokenbasiert, parametrisch und latente Speicher. Aus der Sicht der Funktionen schlagen wir eine feinere Gliederung vor, die faktisches, erfahrungsbezogenes und Arbeitsgedächtnis unterscheidet. Aus der Perspektive der Dynamik analysieren wir, wie Speicher im Laufe der Zeit entstehen, sich entwickeln und abgerufen werden. Um die praktische Entwicklung zu unterstützen, fassen wir eine umfassende Übersicht über Speicherbenchmarks und Open-Source-Frameworks zusammen. Über die Konsolidierung hinaus skizzieren wir eine zukunftsorientierte Perspektive auf aufkommende Forschungsfelder, darunter die Automatisierung von Speicher, die Integration mit Verstärkendem Lernen, multimodaler Speicher, mehragenten-Speicher sowie Fragen der Vertrauenswürdigkeit. Wir hoffen, dass diese Übersicht nicht nur als Referenz für bestehende Arbeiten dient, sondern auch als konzeptionelle Grundlage für die Neubewertung des Speichers als erstklassiges Primitiv in der Gestaltung zukünftiger agenter Intelligenz.

One-sentence Summary

Researchers from National University of Singapore, Renmin University of China, Fudan University, Peking University, and collaborating institutions present a comprehensive "forms-functions-dynamics" taxonomy for agent memory, identifying token-level/parametric/latent memory forms, factual/experiential/working memory functions, and formation/evolution/retrieval dynamics to advance persistent, adaptive capabilities in LLM-based agents beyond traditional long/short-term memory distinctions.

Key Contributions

  • The paper addresses the growing fragmentation in AI agent memory research, where inconsistent terminology and outdated taxonomies like long/short-term memory fail to capture the diversity of modern systems, hindering conceptual clarity and progress.
  • It introduces a unified three-dimensional taxonomy organizing agent memory by forms (token-level, parametric, latent), functions (factual, experiential, working memory), and dynamics (formation, evolution, retrieval), moving beyond coarse temporal categorizations.
  • Supporting this framework, the survey compiles representative benchmarks and open-source memory frameworks while mapping existing systems into the taxonomy through Figure 1, and identifies emerging frontiers like multimodal memory and reinforcement learning integration.

Introduction

The authors highlight that memory has become a cornerstone capability for foundation model-based AI agents, enabling long-horizon reasoning, continual adaptation, and effective interaction with complex environments. As agents evolve beyond static language models into interactive systems for applications like personalized chatbots, recommender systems, and financial investigations, robust memory mechanisms are essential to transform fixed-parameter models into adaptive systems that learn from environmental interactions. Prior work faces significant fragmentation with inconsistent terminology, divergent implementations, and insufficient taxonomies—traditional distinctions like long/short-term memory fail to capture contemporary systems' complexity while overlapping concepts like LLM memory, RAG, and context engineering create conceptual ambiguity. To address these challenges, the authors establish a comprehensive "forms-functions-dynamics" framework that categorizes memory into three architectural forms (token-level, parametric, and latent), three functional roles (factual, experiential, and working memory), and detailed operational dynamics covering memory formation, retrieval, and evolution. This unified taxonomy clarifies conceptual boundaries, reconciles fragmented research, and provides structured analysis of benchmarks, frameworks, and emerging frontiers including reinforcement learning integration, multimodal memory, and trustworthy memory systems.

Dataset

The authors survey two primary categories of evaluation benchmarks for assessing LLM agent memory and long-term capabilities:

  • Memory/Lifelong/Self-Evolving Agent Benchmarks

    • Composition: Explicitly designed for memory retention, lifelong learning, or self-improvement (e.g., MemBench, LoCoMo, LongMemEval).
    • Key details:
      • Focus on factual/experiential memory, multimodal inputs, and simulated/real environments.
      • Sizes range from hundreds to thousands of samples/tasks (e.g., MemBench for user modeling, LongMemEval tracking catastrophic forgetting).
      • Filtering emphasizes controlled memory retention, preference tracking, or multi-episode adaptation.
    • Usage: Evaluated via Table 8, which categorizes benchmarks by memory focus, modality, and scale (e.g., LoCoMo tests preference consistency; LifelongAgentBench measures forward/backward transfer).
  • Other Related Benchmarks

    • Composition: Originally for tool use, embodiment, or reasoning but stress long-horizon memory (e.g., WebShop, ALFWorld, SWE-Bench Verified).
    • Key details:
      • Embodied (ALFWorld), web-based (WebArena), or multi-task (AgentGym) setups.
      • Implicitly test context retention across sequential actions (e.g., WebShop requires recalling prior navigation steps).
      • Scales vary: WebArena uses task-based evaluation; GAIA assesses multi-step research.
    • Usage: Table 9 compares frameworks supporting these benchmarks, noting memory types (factual/experiential), multimodality, and internal structures (e.g., MemoryBank for episodic knowledge consolidation).

The paper uses these benchmarks solely for evaluation—not training—to measure long-context retention, state tracking, and adaptation. No data processing (e.g., cropping) is applied; instead, benchmarks are analyzed via structured feature comparisons in Tables 8–9, highlighting memory mechanisms like self-reflection (Evo-Memory) or tool-augmented storage (MemoryAgentBench).

Method

The authors leverage a comprehensive, multi-faceted framework for LLM-based agent memory systems, which integrates distinct memory forms, functional roles, and dynamic lifecycle processes to enable persistent, adaptive, and goal-directed behavior. The overall architecture is not monolithic but rather a layered ecosystem where token-level, parametric, and latent memory coexist and interact, each serving complementary purposes based on the task’s demands for interpretability, efficiency, or performance.

At the core of the agent loop, each agent iIi \in \mathcal{I}iI observes the environment state sts_tst and receives an observation o˙ti=Oi(st,hti,Q)\dot{o}_t^i = O_i(s_t, h_t^i, \mathcal{Q})o˙ti=Oi(st,hti,Q), where htih_t^ihti represents the agent’s accessible interaction history and Q\mathcal{Q}Q is the fixed task specification. The agent then executes an action at=πi(Oti,mti,Q)a_t = \pi_i(\mathcal{O}_t^i, m_t^i, \mathcal{Q})at=πi(Oti,mti,Q), where mtim_t^imti is a memory-derived signal retrieved from the evolving memory state MtM\mathcal{M}_t \in \mathbb{M}MtM. This memory state is not a static buffer but a dynamic knowledge base that undergoes continuous formation, evolution, and retrieval, forming a closed-loop cognitive cycle.

The memory system’s architecture is structured around three primary forms, each with distinct representational properties and operational characteristics. Token-level memory, as depicted in the taxonomy, organizes information as explicit, discrete units that can be individually accessed and modified. It is further categorized into flat (1D), planar (2D), and hierarchical (3D) topologies. Flat memory stores information as linear sequences or independent clusters, suitable for simple chunking or dialogue logs. Planar memory introduces explicit relational structures such as graphs or trees within a single layer, enabling richer semantic associations and structured retrieval. Hierarchical memory extends this by organizing information across multiple abstraction layers, supporting coarse-to-fine navigation and cross-layer reasoning, as seen in pyramid or multi-layer architectures.

Parametric memory, in contrast, stores information directly within the model’s parameters, either by internalizing knowledge into the base weights or by attaching external parameter modules like adapters or LoRA. This form is implicit and abstract, offering performance gains through direct integration into the model’s forward pass but at the cost of slower updates and potential catastrophic forgetting. Latent memory operates within the model’s internal representational space, encoding experiences as continuous embeddings, KV caches, or hidden states. It is human-unreadable but machine-native, enabling efficient, multimodal fusion and low-latency inference, though it sacrifices transparency and editability.

The functional architecture of the memory system is organized around three pillars: factual, experiential, and working memory. Factual memory serves as a persistent declarative knowledge base, ensuring consistency with user preferences and environmental states. Experiential memory encapsulates procedural knowledge, distilling strategies and skills from past trajectories to enable continual learning. Working memory provides a dynamic, bounded workspace for active context management during a single task or session, addressing both single-turn input condensation and multi-turn state maintenance.

The operational dynamics of the memory system are governed by a cyclical lifecycle of formation, evolution, and retrieval. Memory formation transforms raw experiences into information-dense knowledge units through semantic summarization, knowledge distillation, structured construction, latent representation, or parametric internalization. Memory evolution then integrates these new units into the existing repository through consolidation, updating, and forgetting mechanisms, ensuring coherence, accuracy, and efficiency. Finally, memory retrieval executes context-aware queries to access relevant knowledge at the right moment, involving timing, query construction, retrieval strategies, and post-processing to deliver concise, coherent context to the LLM policy.

This entire framework is designed to be flexible and composable. Different agents may instantiate different subsets of these operations at varying temporal frequencies, giving rise to memory systems that range from passive buffers to actively evolving knowledge bases. The authors emphasize that the choice of memory type and mechanism is not arbitrary but reflects the designer’s intent for how the agent should behave in a given task, balancing trade-offs between interpretability, efficiency, and performance. The architecture thus supports a wide spectrum of applications, from multi-turn chatbots and personalized agents to reasoning-intensive tasks and multimodal, low-resource settings.

Experiment

  • Comparative analysis of open-source memory frameworks for LLM agents validates support for factual memory (vector/structured stores) and growing integration of experiential traces (dialogue histories, episodic summaries) and multimodal memory
  • Frameworks span agent-centric systems with hierarchical memory (e.g., MemGPT, MemoryOS) to general-purpose backends (e.g., Pinecone, Chroma), with many implementing short/long-term separation and graph/profile-based memory spaces
  • While some frameworks report initial results on memory benchmarks, most focus on providing scalable databases and APIs without standardized agent behavior evaluation protocols

KI mit KI entwickeln

Von der Idee bis zum Launch – beschleunigen Sie Ihre KI-Entwicklung mit kostenlosem KI-Co-Coding, sofort einsatzbereiter Umgebung und bestem GPU-Preis.

KI-gestütztes kollaboratives Programmieren
Sofort einsatzbereite GPUs
Die besten Preise

HyperAI Newsletters

Abonnieren Sie unsere neuesten Updates
Wir werden die neuesten Updates der Woche in Ihren Posteingang liefern um neun Uhr jeden Montagmorgen
Unterstützt von MailChimp