Distilling LLM Agent into Small Models with Retrieval and Code Tools

Minki Kang, Jongwon Jeong, Seanie Lee, Jaewoong Cho, Sung Ju Hwang

Release Date: 5/26/2025

Distilling LLM Agent into Small Models with Retrieval and Code Tools

Abstract

Large language models (LLMs) excel at complex reasoning tasks but remaincomputationally expensive, limiting their practical deployment. To addressthis, recent works have focused on distilling reasoning capabilities intosmaller language models (sLMs) using chain-of-thought (CoT) traces from teacherLLMs. However, this approach struggles in scenarios requiring rare factualknowledge or precise computation, where sLMs often hallucinate due to limitedcapability. In this work, we propose Agent Distillation, a framework fortransferring not only reasoning capability but full task-solving behavior fromLLM-based agents into sLMs with retrieval and code tools. We improve agentdistillation along two complementary axes: (1) we introduce a prompting methodcalled first-thought prefix to enhance the quality of teacher-generatedtrajectories; and (2) we propose a self-consistent action generation forimproving test-time robustness of small agents. We evaluate our method on eightreasoning tasks across factual and mathematical domains, covering bothin-domain and out-of-domain generalization. Our results show that sLMs as smallas 0.5B, 1.5B, 3B parameters can achieve performance competitive with next-tierlarger 1.5B, 3B, 7B models fine-tuned using CoT distillation, demonstrating thepotential of agent distillation for building practical, tool-using smallagents. Our code is available at https://github.com/Nardien/agent-distillation.

View Paper Details