HyperAIHyperAI

Command Palette

Search for a command to run...

12 days ago

LoongRL:Reinforcement Learning for Advanced Reasoning over Long Contexts

Siyuan Wang Gaokai Zhang Li Lyna Zhang Ning Shang Fan Yang Dongyao Chen Mao Yang

LoongRL:Reinforcement Learning for Advanced Reasoning over Long Contexts

Abstract

Reasoning over long contexts is essential for large language models. Whilereinforcement learning (RL) enhances short-context reasoning by inducing "Aha"moments in chain-of-thought, the advanced thinking patterns required forlong-context reasoning remain largely unexplored, and high-difficulty RL dataare scarce. In this paper, we introduce LoongRL, a data-driven RL method foradvanced long-context reasoning. Central to LoongRL is KeyChain, a synthesisapproach that transforms short multi-hop QA into high-difficulty long-contexttasks by inserting UUID chains that hide the true question among largecollections of distracting documents. Solving these tasks requires the model totrace the correct chain step-by-step, identify the true question, retrieverelevant facts and reason over them to answer correctly. RL training onKeyChain data induces an emergent plan-retrieve-reason-recheck reasoningpattern that generalizes far beyond training length. Models trained at 16Keffectively solve 128K tasks without prohibitive full-length RL rollout costs.On Qwen2.5-7B and 14B, LoongRL substantially improves long-context multi-hop QAaccuracy by +23.5% and +21.1% absolute gains. The resulting LoongRL-14B reachesa score of 74.2, rivaling much larger frontier models such as o3-mini (74.5)and DeepSeek-R1 (74.9). It also improves long-context retrieval, passes all128K needle-in-a-haystack stress tests, and preserves short-context reasoningcapabilities.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
LoongRL:Reinforcement Learning for Advanced Reasoning over Long Contexts | Papers | HyperAI