5 months ago

Abstract

Reinforcement Learning with Verifiable Rewards (RLVR) has demonstratedsuccess in enhancing LLM reasoning capabilities, but remains limited tosingle-turn interactions without tool integration. While recent AgenticReinforcement Learning with Tool use (ARLT) approaches have emerged to addressmulti-turn tool interactions, existing works develop task-specific codebasesthat suffer from fragmentation, synchronous execution bottlenecks, and limitedextensibility across domains. These inefficiencies hinder broader communityadoption and algorithmic innovation. We introduce VerlTool, a unified andmodular framework that addresses these limitations through systematic designprinciples. VerlTool provides four key contributions: (1) upstream alignmentwith VeRL ensuring compatibility and simplified maintenance, (2) unified toolmanagement via standardized APIs supporting diverse modalities including codeexecution, search, SQL databases, and vision processing, (3) asynchronousrollout execution achieving near 2times speedup by eliminatingsynchronization bottlenecks, and (4) comprehensive evaluation demonstratingcompetitive performance across 6 ARLT domains. Our framework formalizes ARLT asmulti-turn trajectories with multi-modal observation tokens (text/image/video),extending beyond single-turn RLVR paradigms. We train and evaluate models onmathematical reasoning, knowledge QA, SQL generation, visual reasoning, websearch, and software engineering tasks, achieving results comparable tospecialized systems while providing unified training infrastructure. Themodular plugin architecture enables rapid tool integration requiring onlylightweight Python definitions, significantly reducing development overhead andproviding a scalable foundation for tool-augmented RL research. Our code isopen-sourced at https://github.com/TIGER-AI-Lab/verl-tool.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

5 months ago

Dongfu Jiang Yi Lu Zhuofeng Li Zhiheng Lyu Ping Nie Haozhe Wang Alex Su Hui Chen Kai Zou Chao Du

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

5 months ago

Dongfu Jiang Yi Lu Zhuofeng Li Zhiheng Lyu Ping Nie Haozhe Wang Alex Su Hui Chen Kai Zou Chao Du

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use

Dongfu Jiang Yi Lu Zhuofeng Li Zhiheng Lyu Ping Nie Haozhe Wang Alex Su Hui Chen Kai Zou Chao Du2 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use

Dongfu Jiang Yi Lu Zhuofeng Li Zhiheng Lyu Ping Nie Haozhe Wang Alex Su Hui Chen Kai Zou Chao Du2 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use

Dongfu Jiang Yi Lu Zhuofeng Li Zhiheng Lyu Ping Nie Haozhe Wang Alex Su Hui Chen Kai Zou Chao Du2 more

Abstract

Build AI with AI

HyperAI Newsletters

Dongfu Jiang Yi Lu Zhuofeng Li Zhiheng Lyu Ping Nie Haozhe Wang Alex Su Hui Chen Kai Zou Chao Du

Dongfu Jiang Yi Lu Zhuofeng Li Zhiheng Lyu Ping Nie Haozhe Wang Alex Su Hui Chen Kai Zou Chao Du

Dongfu Jiang Yi Lu Zhuofeng Li Zhiheng Lyu Ping Nie Haozhe Wang Alex Su Hui Chen Kai Zou Chao Du