AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

In this paper, we introduce a novel learning paradigm for adaptive LargeLanguage Model (LLM) agents that eliminates the need for fine-tuning theunderlying LLMs. Existing approaches are often either rigid, relying on static,handcrafted reflection workflows, or computationally intensive, requiringgradient updates of LLM model parameters. In contrast, our method enableslow-cost continual adaptation via memory-based online reinforcement learning.We formalise this as a Memory-augmented Markov Decision Process (M-MDP),equipped with a neural case-selection policy to guide action decisions. Pastexperiences are stored in an episodic memory, either differentiable ornon-parametric. The policy is continually updated based on environmentalfeedback through a memory rewriting mechanism, whereas policy improvement isachieved through efficient memory reading (retrieval). We instantiate our agentmodel in the deep research setting, namely AgentFly, which attains top-1 onGAIA validation (87.88% Pass@3) and 79.40% on the test set. It reaches66.6% F1 and 80.4% PM on the DeepResearcher dataset, outperforming thestate-of-the-art training-based method, while case-based memory adds 4.7% to9.6% absolute points on out-of-distribution tasks. Our approach offers ascalable and efficient pathway for developing generalist LLM agents capable ofcontinuous, real-time learning without gradient updates, advancing machinelearning towards open-ended skill acquisition and deep research scenarios. Thecode is available at https://github.com/Agent-on-the-Fly/AgentFly.