2 months ago

Parallel-R1: Towards Parallel Thinking via Reinforcement Learning

Tong Zheng Hongming Zhang Wenhao Yu Xiaoyang Wang Xinyu Yang Runpeng Dai Rui Liu Huiwen Bao Chengsong Huang Heng Huang

Abstract

Parallel thinking has emerged as a novel approach for enhancing the reasoningcapabilities of large language models (LLMs) by exploring multiple reasoningpaths concurrently. However, activating such capabilities through trainingremains challenging, as existing methods predominantly rely on supervisedfine-tuning (SFT) over synthetic data, which encourages teacher-forcedimitation rather than exploration and generalization. Different from them, wepropose Parallel-R1, the first reinforcement learning (RL) frameworkthat enables parallel thinking behaviors for complex real-world reasoningtasks. Our framework employs a progressive curriculum that explicitly addressesthe cold-start problem in training parallel thinking with RL. We first use SFTon prompt-generated trajectories from easier tasks to instill the parallelthinking ability, then transition to RL to explore and generalize this skill onharder problems. Experiments on various math benchmarks, including MATH, AMC23,and AIME, show that Parallel-R1 successfully instills parallel thinking,leading to 8.4% accuracy improvements over the sequential thinking modeltrained directly on challenging tasks with RL. Further analysis reveals a clearshift in the model's thinking behavior: at an early stage, it uses parallelthinking as an exploration strategy, while in a later stage, it uses the samecapability for multi-perspective verification. Most significantly, we validateparallel thinking as a mid-training exploration scaffold, where thistemporary exploratory phase unlocks a higher performance ceiling after RL,yielding a 42.9% improvement over the baseline on AIME25. Our model, data, andcode will be open-source at https://github.com/zhengkid/Parallel-R1.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Parallel-R1: Towards Parallel Thinking via Reinforcement Learning

Tong Zheng Hongming Zhang Wenhao Yu Xiaoyang Wang Xinyu Yang Runpeng Dai Rui Liu Huiwen Bao Chengsong Huang Heng Huang1 more

Abstract

Build AI with AI

Hyper Newsletters

Tong Zheng Hongming Zhang Wenhao Yu Xiaoyang Wang Xinyu Yang Runpeng Dai Rui Liu Huiwen Bao Chengsong Huang Heng Huang