HyperAIHyperAI

Command Palette

Search for a command to run...

il y a 6 heures

Mémoire Agentique Générale via la Recherche Approfondie

B. Y. Yan Chaofan Li Hongjin Qian Shuqi Lu Zheng Liu

Mémoire Agentique Générale via la Recherche Approfondie

Résumé

Voici la traduction du texte en français, respectant le style formel et la terminologie propre au domaine technologique et académique :La mémoire est un élément critique pour les agents d'intelligence artificielle ; cependant, la mémoire statique largement adoptée, qui vise à créer à l'avance une mémoire immédiatement disponible, est inévitablement sujette à une perte d'information sévère. Pour pallier cette limitation, nous proposons un nouveau cadre intitulé « Mémoire Agentique Générale » (General Agentic Memory - GAM). Le système GAM suit le principe de la « compilation juste-à-temps » (Just-in-Time — JIT), en se concentrant sur la création de contextes optimisés pour son client au moment de l'exécution, tout en ne conservant qu'une mémoire simple mais utile durant la phase hors ligne. À cette fin, GAM emploie une architecture duale composée des éléments suivants : Le Mémoriseur (Memorizer), qui met en exergue les informations historiques clés à l'aide d'une mémoire légère, tout en maintenant l'intégralité de l'historique au sein d'un stockage de pages universel (universal page-store).Le Chercheur (Researcher), qui récupère et intègre les informations pertinentes depuis le stockage de pages pour répondre à sa requête en ligne, guidé par la mémoire pré-construite.Cette conception permet à GAM d'exploiter efficacement les capacités agentiques et la scalabilité au moment du test (test-time scalability) des grands modèles de langage (LLM) de pointe, tout en facilitant l'optimisation des performances de bout en bout via l'apprentissage par renforcement. Dans notre étude expérimentale, nous démontrons que GAM permet d'obtenir des améliorations substantielles dans divers scénarios d'accomplissement de tâches fondées sur la mémoire, par rapport aux systèmes de mémoire existants.

Summarization

Researchers from the Beijing Academy of Artificial Intelligence, Peking University, and Hong Kong Polytechnic University introduce General Agentic Memory (GAM), a framework that overcomes static memory limitations by applying a just-in-time compilation principle where a dual Memorizer-Researcher architecture dynamically constructs optimized contexts from a universal page-store for enhanced memory-grounded task completion.

Introduction

AI agents are increasingly deployed in complex fields like software engineering and scientific research, creating an urgent need to manage rapidly expanding contexts. As these agents integrate internal reasoning with external feedback, effective memory systems are essential for maintaining continuity and accuracy without overwhelming the model's context window.

Prior approaches typically rely on "Ahead-of-Time" compilation, where data is compressed into static memory offline. This method suffers from inevitable information loss during compression, struggles with ad-hoc requests due to rigid structures, and depends heavily on manual heuristics that hinder cross-domain generalization.

The authors propose General Agentic Memory (GAM), a framework based on "Just-in-Time" compilation that preserves complete historical data while generating customized contexts on demand. By treating memory retrieval as a dynamic search process rather than a static lookup, GAM ensures lossless information access tailored to specific queries through a dual-agent system.

Key innovations include:

  • Dual-Agent Architecture: The system employs a "Memorizer" to index historical sessions and a "Researcher" to perform iterative deep research and reflection to satisfy complex client needs.
  • High-Fidelity Adaptability: By maintaining full history in a database and retrieving only what is necessary at runtime, the framework avoids compression loss and adapts dynamically to specific tasks.
  • Self-Optimizing Generalization: The approach eliminates the need for domain-specific rules, allowing the system to operate across diverse scenarios and improve continuously through reinforcement learning.

Method

The authors leverage a dual-module architecture for their General Agentic Memory (GAM) system, designed to manage long agent trajectories efficiently while maintaining task performance. The framework operates in two distinct phases: an offline memorization stage and an online research stage. As shown in the figure below, the overall system consists of a memorizer and a researcher, both of which are large language model (LLM)-based agents. The memorizer processes the agent's historical trajectory during the offline phase, generating a compact memory representation and preserving the complete trajectory in a page-store. The researcher, in contrast, operates online to address client requests by retrieving and integrating relevant information from the page-store, ultimately producing an optimized context for downstream task completion.

During the offline stage, the memorizer performs two key operations for each incoming session sis_isi. First, it executes a memorizing step, which generates a concise and well-structured memo μi\mu_iμi that captures the crucial information of the new session. This memo is produced based on both the current session and the existing memory mim_imi, and the memory is incrementally updated by adding the new memo to form mi+1m_{i+1}mi+1. Second, the memorizer performs a paging operation, which creates a complete page for the session. This process begins by generating a header hih_ihi that contains essential contextual information from the preceding trajectory. The header is then used to decorate the session content, forming a new page that is appended to the page-store ppp. This two-step process ensures that the system maintains both a lightweight, optimized memory and a comprehensive, semantically consistent record of the agent's history.

In the online stage, the researcher is tasked with addressing a client's request. It begins by performing a planning step, which involves chain-of-thought reasoning to analyze the information needs of the request rrr. Based on this analysis, the researcher generates a concrete search plan using a provided search toolkit T\mathcal{T}T, which includes an embedding model for vector search, a BM25 retriever for keyword-based search, and an ID-based retriever for direct page exploration. The planning process is guided by a specific prompt, as illustrated in the figure below, which instructs the model to generate a JSON object specifying the required tools and their parameters.

Upon receiving the search plan, the researcher executes the search actions in parallel, retrieving relevant pages ptp_tpt from the page-store. It then integrates the information from the retrieved pages with the last integration result I\mathcal{I}I for the request rrr, updating the integration result. This process is repeated iteratively. After each integration, the researcher performs a reflection step to determine if the information needed to answer the request has been fully collected. This is done using a binary indicator yyy. If the reflection indicates that information is still missing (y=Noy = \text{No}y=No), the researcher generates a new, more focused request rr'r to drive another round of deep research. If the information is deemed complete (y=Yesy = \text{Yes}y=Yes), the research process concludes, and the final integration result is returned as the optimized context. The reflection process is guided by a prompt that instructs the model to identify missing information and generate targeted follow-up retrieval questions.

Experiment

  • Evaluated GAM against memory-free methods (Long-LLM, RAG) and memory-based baselines (e.g., Mem0, LightMem) using LoCoMo, HotpotQA, RULER, and NarrativeQA benchmarks.
  • GAM consistently outperformed all baselines across every dataset, notably achieving over 90% accuracy on RULER multi-hop tracing tasks where other methods failed.
  • Demonstrated robustness to varying context lengths, maintaining high performance on HotpotQA contexts ranging from 56K to 448K tokens.
  • Model scaling analysis indicated that larger backbone models improve results, with the research module showing significantly higher sensitivity to model size than the memorization module.
  • Ablation studies confirmed that combining search tools (Page-id, Embedding, BM25) yields the best results and that removing the memory module causes substantial performance degradation.
  • Increasing test-time computation, specifically through higher reflection depth and more retrieved pages, resulted in steady performance gains.
  • Efficiency evaluations showed GAM incurs time costs comparable to Mem0 and MemoryOS while delivering superior cost-effectiveness.

The authors use GAM to achieve the best performance across all benchmarks, consistently outperforming both memory-free and memory-based baselines on LoCoMo, HotpotQA, RULER, and NarrativeQA. Results show that GAM significantly improves over existing methods, particularly in complex tasks requiring multi-hop reasoning and long-context understanding, while maintaining competitive efficiency.

The authors use GAM to achieve state-of-the-art performance across multiple long-context benchmarks, consistently outperforming both memory-free and memory-based baselines. Results show that GAM significantly improves accuracy on complex tasks requiring multi-hop reasoning and retrieval, particularly on HotpotQA and RULER, where it achieves over 90% accuracy on multi-hop tracing tasks, while also maintaining stable performance across varying context lengths.

The authors use GAM with different LLM backbones to evaluate its performance on HotpotQA and NarrativeQA, showing that larger models generally improve results. GAM achieves the highest average F1 score with GPT-4o-mini, outperforming all Qwen2.5 variants, and demonstrates consistent gains as model size increases, particularly on longer contexts.

The authors use GAM with different LLM backbones to evaluate the impact of model size on performance. Results show that larger models consistently improve performance, with GPT-4o-mini achieving the highest average F1 score of 55.45, while the smallest Qwen2.5-0.5B model achieves the lowest at 9.08.

Results show that GAM achieves the highest performance across all benchmarks, with its effectiveness enhanced by combining multiple search tools and both memory and research modules. The ablation study indicates that using the full system with all tools and modules yields the best results, while removing either component significantly reduces performance.

Construire l'IA avec l'IA

De l'idée au lancement — accélérez votre développement IA avec du co-codage IA gratuit, un environnement prêt à l'emploi et les meilleurs prix GPU.

Co-codage IA
GPU prêts à utiliser
Meilleurs prix
Commencer

Hyper Newsletters

Abonnez-vous à nos dernières mises à jour
Nous vous enverrons les dernières mises à jour de la semaine dans votre boîte de réception à neuf heures chaque lundi matin
Propulsé par MailChimp