Command Palette
Search for a command to run...
메모리 전이 학습: Coding Agent에서 메모리가 도메인 간에 어떻게 전이되는가
메모리 전이 학습: Coding Agent에서 메모리가 도메인 간에 어떻게 전이되는가
Kangsan Kim Minki Kang Taeil Kim Yanlai Yang Mengye Ren Sung Ju Hwang
초록
메모리 기반의 자기 진화(Memory-based self-evolution)는 코딩 Agent를 위한 유망한 패러다임으로 부상했습니다. 그러나 기존의 접근 방식은 대개 메모리 활용을 동일한 작업 도메인(homogeneous task domains)으로 제한하며, 다양한 실제 코딩 문제 전반에 걸쳐 존재하는 런타임 환경(runtime environments) 및 프로그래밍 언어와 같은 공유된 인프라 기반을 활용하지 못한다는 한계가 있습니다.이러한 한계를 해결하기 위해, 본 연구에서는 이기종 도메인(heterogeneous domains)으로부터 통합된 메모리 풀(unified memory pool)을 활용하는 메모리 전이 학습(Memory Transfer Learning, MTL)을 조사합니다. 우리는 구체적인 트레이스(concrete traces)부터 추상적인 통찰(abstract insights)에 이르는 네 가지 메모리 표현(memory representations)을 사용하여 6개의 코딩 benchmark 성능을 평가합니다.실험 결과, 도메인 간 메모리 활용은 평균 성능을 3.7% 향상시키는 것으로 나타났으며, 이는 작업 특정적인 코드보다는 검증 루틴(validation routines)과 같은 메타 지식(meta-knowledge)을 전이함으로써 이루어짐을 확인했습니다. 중요한 점은 추상화(abstraction) 정도가 전이 가능성을 결정한다는 것입니다. 즉, 높은 수준의 통찰은 일반화가 잘 이루어지는 반면, 낮은 수준의 트레이스는 과도한 특수성으로 인해 종종 부정적 전이(negative transfer)를 유발합니다.나아가, 전이 효과는 메모리 풀의 크기에 따라 확장되며, 서로 다른 모델 간에도 메모리 전이가 가능하다는 것을 보여줍니다. 본 연구는 메모리 활용 범위를 단일 도메인의 폐쇄적인 환경(silos) 너머로 확장하기 위한 경험적 설계 원칙을 정립합니다.프로젝트 페이지: https://memorytransfer.github.io/
One-sentence Summary
By investigating Memory Transfer Learning (MTL) through a unified memory pool across heterogeneous domains, researchers demonstrate that prioritizing abstract meta-knowledge over task-specific traces improves performance across six coding benchmarks by 3.7% and establishes design principles for scaling memory and enabling model-to-model transfer.
Key Contributions
- This work introduces Memory Transfer Learning, a paradigm that enables coding agents to leverage a unified memory pool generated from heterogeneous task domains to improve performance on target tasks.
- The research evaluates four distinct memory representations ranging from concrete traces to abstract insights across six coding benchmarks, demonstrating that cross-domain memory improves average performance by 3.7%.
- The study establishes empirical design principles by showing that high-level meta-knowledge like validation routines generalizes effectively, while low-level traces often cause negative transfer due to excessive specificity.
Introduction
Memory-based self-evolution is a critical paradigm for advancing coding agents by allowing them to reuse successful workflows and avoid past errors. While these agents rely on memory to reduce reasoning overhead, existing approaches typically restrict memory utilization to homogeneous task domains, failing to exploit the shared infrastructural foundations like programming languages and runtime environments found across diverse coding problems. The authors investigate Memory Transfer Learning (MTL) by leveraging a unified memory pool from heterogeneous domains to improve agent performance. Their research demonstrates that cross-domain memory improves average performance by 3.7% by transferring high-level meta-knowledge rather than task-specific code. Furthermore, the authors establish that memory abstraction is a key driver of success, finding that high-level insights generalize well while low-level execution traces often lead to negative transfer due to excessive specificity.
Method
The authors design a memory-based coding agent that operates through a two-stage process: memory generation and memory retrieval. This framework is structured to enable the agent to leverage past experiences for improved task performance, particularly in cross-domain scenarios. The overall architecture is illustrated in the figure below, which contrasts three settings: a baseline agent without memory, a self-evolving agent using memory from a single domain, and the proposed memory transfer learning approach that enables knowledge transfer across heterogeneous domains.

During the memory generation phase, the agent performs inference across all benchmarks to collect trajectories, which are sequences of reasoning, actions, and observations. These trajectories form the basis for constructing four distinct memory representations. The first is the Trajectory memory, which preserves the full sequence of actions and their corresponding observations without reasoning steps, capturing detailed execution history. The second is the Workflow memory, which abstracts the trajectory into a reusable sequence of actions aimed at achieving a specific goal, thereby reducing noise from irrelevant details. The third is the Summary memory, where a large language model (LLM) generates a concise summary of the task and a paragraph explaining the success or failure of the inference, providing explicit analysis. The fourth is the Insight memory, which distills the experience into a generalizable format consisting of a title, a brief description, and a content section that encapsulates transferable knowledge without referencing specific implementation details.

Following generation, the memories are indexed and stored in a heterogeneous-domain memory pool, which is constructed by aggregating memories from all benchmarks except the one being tested for each memory type. This ensures that the retrieval process operates on knowledge from different domains. During inference, the agent retrieves the most relevant memories by first generating an embedding for the current task and then computing the cosine similarity between this embedding and the embeddings of the stored memories. The top-N memories with the highest similarity scores are selected and incorporated into the system prompt at the beginning of the inference process, guiding the agent's reasoning and action selection.
The effectiveness of memory transfer is further analyzed through a formal mathematical framework that models the abstraction-transfer tradeoff. This model decomposes a memory embedding into a domain-invariant component, representing transferable meta-knowledge, and a domain-specific component. The abstraction level of a memory is defined as the proportion of its invariant component, and the utility of retrieving a memory for an unseen task is modeled as a trade-off between the transferable guidance provided by the invariant component and the penalty from domain mismatch due to the specific component. This formalization demonstrates that higher abstraction levels lead to greater transfer gains, providing a theoretical foundation for the observed benefits of generalizable memory representations.
Experiment
The researchers evaluated Memory Transfer Learning (MTL) for coding agents across six diverse benchmarks, ranging from competitive programming to repository-level tasks, to validate the effectiveness of cross-domain knowledge transfer. The experiments demonstrate that MTL significantly outperforms zero-shot baselines and existing self-evolving methods by providing high-level meta-knowledge rather than task-specific implementation details. Findings indicate that more abstract memory representations, such as Insights, yield superior transferability and efficiency compared to low-level trajectories, which can lead to brittle execution errors. Furthermore, the study shows that the benefits of MTL scale with the size and diversity of the memory pool and remain effective even when transferring memories across different language models.
The authors evaluate memory transfer learning across multiple coding benchmarks and models, showing consistent performance improvements over zero-shot baselines. Insight memories achieve the highest average performance, indicating that high-level, abstract knowledge transfers more effectively than task-specific details. Memory transfer learning improves performance over zero-shot baselines across different models and benchmarks. Insight memories, which capture high-level procedural guidance, achieve the highest average performance. Performance improves with larger memory pools and greater domain diversity, indicating better access to transferable meta-knowledge.

The authors evaluate memory transfer learning across multiple coding benchmarks and models, showing consistent performance improvements over zero-shot baselines. Insight memory types achieve the highest average gains, indicating that higher abstraction levels enhance transfer effectiveness. Memory transfer learning improves performance across diverse coding benchmarks and models compared to zero-shot settings. Insight memory types consistently outperform other memory types, highlighting the importance of abstraction in transfer learning. Performance gains increase with larger memory pools and greater domain diversity, suggesting that knowledge transfer benefits from broader training data.

The the the table compares different memory retrieval methods across multiple benchmarks, showing that embedding similarity achieves the highest average performance. Adaptive rewriting and LLM reranking perform similarly but are slightly lower than embedding similarity, while the no memory baseline is the lowest. Embedding similarity retrieval method outperforms other methods across benchmarks. Adaptive rewriting and LLM reranking show similar performance, slightly below embedding similarity. The no memory baseline has the lowest performance in all evaluated metrics.

The chart compares memory abstraction levels using DBI and LISI metrics across different memory types. Trajectory and Workflow memories show stronger clustering, while Insight memories exhibit greater mixing and sparsity, indicating higher abstraction and generalization. Insight memories are more abstract and generalized compared to Trajectory and Workflow memories. Higher abstraction levels correlate with greater mixing and weaker clustering in the embedding space. DBI and LISI metrics show that Insight memories have the highest abstraction and the most diverse representation.

The authors evaluate Memory Transfer Learning across multiple coding benchmarks and models, showing consistent performance improvements over zero-shot baselines. Insight memories yield the highest average gains, indicating that abstract, meta-knowledge-based memories are most effective for cross-domain transfer. Memory Transfer Learning consistently improves performance over zero-shot baselines across different models and benchmarks. Insight memories achieve the highest average performance gains, suggesting abstract meta-knowledge is most transferable. Performance improves with larger memory pools and more diverse source domains, enhancing the likelihood of retrieving useful knowledge.

The authors evaluate memory transfer learning across various coding benchmarks and models to validate the effectiveness of different memory types and retrieval methods. Results demonstrate that memory transfer consistently outperforms zero-shot baselines, with high-level insight memories providing the most effective transfer due to their abstract and generalized nature. Furthermore, performance scales with larger, more diverse memory pools, while embedding similarity serves as the most effective retrieval mechanism.