HyperAI

OpenCodeReasoning Programming Reasoning Dataset

Date

18 days ago

Organization

NVIDIA

Publish URL

huggingface.co

Download Help

OpenCodeReasoning is a large-scale synthetic programming reasoning dataset released by NVIDIA in 2025. It aims to provide high-quality programming reasoning training data for large language models (LLMs) and promote the improvement of code generation and logical reasoning capabilities. The relevant paper results are:OpenCodeReasoning: Advancing Data Distillation for Competitive Coding".

The dataset contains 735,255 samples, covering 28,319 unique programming questions, and is one of the largest reasoning programming datasets currently available.

Data source:

  • It integrates questions from 11 mainstream programming platforms, including CodeForces, CodeChef, LeetCode, and public data sets such as TACO, APPS, and CodeContests.
  • The code response is generated by NVIDIA's self-developed model R1 to ensure data consistency and standardization of reasoning logic.