Eurus-2-RL-Data Mathematical Programming Problem Training Dataset
Date
Size
Publish URL
Categories
Eurus-2-RL-Data is a high-quality dataset specifically for reinforcement learning training, mainly used in solving mathematical and programming problems. The relevant blog is "Process Reinforcement through Implicit Rewards".
The math problems in this dataset are partly derived from NuminaMath-CoT, covering a wide range of topics from Chinese high school mathematics to the International Mathematical Olympiad. Programming problems come from multiple platforms, including APPS, CodeContests, TACO, and Codeforces, and are mainly aimed at programming competition-level questions. In order to ensure the quality of the data, Eurus-2-RL-Data has been rigorously cleaned and filtered. Mathematical problems were screened using advanced reasoning models such as Qwen-QwQ to remove unsolvable, mismatched, or wrong-answered questions, and to convert multiple-choice questions into open-ended questions. Programming problems mainly remove duplicate content. After these processes,The dataset ultimately contains about 455k math problems and 27k programming problems. The main application areas of Eurus-2-RL-Data are reinforcement learning and programming competitions. It provides an effective training platform for the model, helping it to learn more deeply and optimize when solving complex problems.