Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning

Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as apowerful paradigm for enhancing the reasoning capabilities of LLMs. Existingresearch has predominantly concentrated on isolated reasoning domains such asmathematical problem-solving, coding tasks, or logical reasoning. However, realworld reasoning scenarios inherently demand an integrated application ofmultiple cognitive skills. Despite this, the interplay among these reasoningskills under reinforcement learning remains poorly understood. To bridge thisgap, we present a systematic investigation of multi-domain reasoning within theRLVR framework, explicitly focusing on three primary domains: mathematicalreasoning, code generation, and logical puzzle solving. We conduct acomprehensive study comprising four key components: (1) Leveraging the GRPOalgorithm and the Qwen-2.5-7B model family, our study thoroughly evaluates themodels' in-domain improvements and cross-domain generalization capabilitieswhen trained on single-domain datasets. (2) Additionally, we examine theintricate interactions including mutual enhancements and conflicts that emergeduring combined cross-domain training. (3) To further understand the influenceof SFT on RL, we also analyze and compare performance differences between baseand instruct models under identical RL configurations. (4) Furthermore, wedelve into critical RL training details, systematically exploring the impactsof curriculum learning strategies, variations in reward design, andlanguage-specific factors. Through extensive experiments, our results offersignificant insights into the dynamics governing domain interactions, revealingkey factors influencing both specialized and generalizable reasoningperformance. These findings provide valuable guidance for optimizing RLmethodologies to foster comprehensive, multi-domain reasoning capabilities inLLMs.