QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Fanqi Wan, Weizhou Shen, Shengyi Liao, Yingcheng Shi, Chenliang Li, Ziyi Yang, Ji Zhang, Fei Huang, Jingren Zhou, Ming Yan

公開日: 5/26/2025

QwenLong-L1: Towards Long-Context Large Reasoning Models with
Reinforcement Learning

要約

Recent large reasoning models (LRMs) have demonstrated strong reasoningcapabilities through reinforcement learning (RL). These improvements haveprimarily been observed within the short-context reasoning tasks. In contrast,extending LRMs to effectively process and reason on long-context inputs via RLremains a critical unsolved challenge. To bridge this gap, we first formalizethe paradigm of long-context reasoning RL, and identify key challenges insuboptimal training efficiency and unstable optimization process. To addressthese issues, we propose QwenLong-L1, a framework that adapts short-contextLRMs to long-context scenarios via progressive context scaling. Specifically,we utilize a warm-up supervised fine-tuning (SFT) stage to establish a robustinitial policy, followed by a curriculum-guided phased RL technique tostabilize the policy evolution, and enhanced with a difficulty-awareretrospective sampling strategy to incentivize the policy exploration.Experiments on seven long-context document question-answering benchmarksdemonstrate that QwenLong-L1-32B outperforms flagship LRMs like OpenAI-o3-miniand Qwen3-235B-A22B, achieving performance on par withClaude-3.7-Sonnet-Thinking, demonstrating leading performance amongstate-of-the-art LRMs. This work advances the development of practicallong-context LRMs capable of robust reasoning across information-intensiveenvironments.

論文の詳細を見る View Code