a day ago

AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

Huichi Zhou, Yihang Chen, Siyuan Guo, Xue Yan, Kin Hei Lee, Zihan Wang, Ka Yiu Lee, Guchun Zhang, Kun Shao, Linyi Yang, Jun Wang

View Paper Details View Code

AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

Abstract

In this paper, we introduce a novel learning paradigm for adaptive LargeLanguage Model (LLM) agents that eliminates the need for fine-tuning theunderlying LLMs. Existing approaches are often either rigid, relying on static,handcrafted reflection workflows, or computationally intensive, requiringgradient updates of LLM model parameters. In contrast, our method enableslow-cost continual adaptation via memory-based online reinforcement learning.We formalise this as a Memory-augmented Markov Decision Process (M-MDP),equipped with a neural case-selection policy to guide action decisions. Pastexperiences are stored in an episodic memory, either differentiable ornon-parametric. The policy is continually updated based on environmentalfeedback through a memory rewriting mechanism, whereas policy improvement isachieved through efficient memory reading (retrieval). We instantiate our agentmodel in the deep research setting, namely AgentFly, which attains top-1 onGAIA validation (87.88% Pass@3) and 79.40% on the test set. It reaches66.6% F1 and 80.4% PM on the DeepResearcher dataset, outperforming thestate-of-the-art training-based method, while case-based memory adds 4.7% to9.6% absolute points on out-of-distribution tasks. Our approach offers ascalable and efficient pathway for developing generalist LLM agents capable ofcontinuous, real-time learning without gradient updates, advancing machinelearning towards open-ended skill acquisition and deep research scenarios. Thecode is available at https://github.com/Agent-on-the-Fly/AgentFly.