HyperAI

Scaling Test-time Compute for LLM Agents

King Zhu, Hanhao Li, Siwei Wu, Tianshun Xing, Dehua Ma, Xiangru Tang, Minghao Liu, Jian Yang, Jiaheng Liu, Yuchen Eleanor Jiang, Changwang Zhang, Chenghua Lin, Jun Wang, Ge Zhang, Wangchunshu Zhou
Release Date: 6/18/2025
Scaling Test-time Compute for LLM Agents
Abstract

Scaling test time compute has shown remarkable success in improving thereasoning abilities of large language models (LLMs). In this work, we conductthe first systematic exploration of applying test-time scaling methods tolanguage agents and investigate the extent to which it improves theireffectiveness. Specifically, we explore different test-time scaling strategies,including: (1) parallel sampling algorithms; (2) sequential revisionstrategies; (3) verifiers and merging methods; (4)strategies for diversifyingrollouts.We carefully analyze and ablate the impact of different designstrategies on applying test-time scaling on language agents, and have followfindings: 1. Scaling test time compute could improve the performance of agents.2. Knowing when to reflect is important for agents. 3. Among differentverification and result merging approaches, the list-wise method performs best.4. Increasing diversified rollouts exerts a positive effect on the agent's taskperformance.