2 months ago

rStar2-Agent: Agentic Reasoning Technical Report

Ning Shang Yifei Liu Yi Zhu Li Lyna Zhang Weijiang Xu Xinyu Guan Buze Zhang Bingcheng Dong Xudong Zhou Bowen Zhang

Abstract

We introduce rStar2-Agent, a 14B math reasoning model trained with agenticreinforcement learning to achieve frontier-level performance. Beyond currentlong CoT, the model demonstrates advanced cognitive behaviors, such as thinkingcarefully before using Python coding tools and reflecting on code executionfeedback to autonomously explore, verify, and refine intermediate steps incomplex problem-solving. This capability is enabled through three keyinnovations that makes agentic RL effective at scale: (i) an efficient RLinfrastructure with a reliable Python code environment that supportshigh-throughput execution and mitigates the high rollout costs, enablingtraining on limited GPU resources (64 MI300X GPUs); (ii) GRPO-RoC, an agenticRL algorithm with a Resample-on-Correct rollout strategy that addresses theinherent environment noises from coding tools, allowing the model to reasonmore effectively in a code environment; (iii) An efficient agent trainingrecipe that starts with non-reasoning SFT and progresses through multi-RLstages, yielding advanced cognitive abilities with minimal compute cost. Tothis end, rStar2-Agent boosts a pre-trained 14B model to state of the art inonly 510 RL steps within one week, achieving average pass@1 scores of 80.6% onAIME24 and 69.8% on AIME25, surpassing DeepSeek-R1 (671B) with significantlyshorter responses. Beyond mathematics, rStar2-Agent-14B also demonstratesstrong generalization to alignment, scientific reasoning, and agentic tool-usetasks. Code and training recipes are available athttps://github.com/microsoft/rStar.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

rStar2-Agent: Agentic Reasoning Technical Report

Ning Shang Yifei Liu Yi Zhu Li Lyna Zhang Weijiang Xu Xinyu Guan Buze Zhang Bingcheng Dong Xudong Zhou Bowen Zhang5 more

Abstract

Build AI with AI

Hyper Newsletters

Ning Shang Yifei Liu Yi Zhu Li Lyna Zhang Weijiang Xu Xinyu Guan Buze Zhang Bingcheng Dong Xudong Zhou Bowen Zhang