HyperAI

OpenCodeReasoning Programming Reasoning Dataset

Date

2 months ago

Size

8.11 GB

Organization

NVIDIA

Publish URL

huggingface.co

OpenCodeReasoning is a large-scale synthetic programming reasoning dataset released by NVIDIA in 2025. It aims to provide high-quality programming reasoning training data for large language models (LLMs) and promote the improvement of code generation and logical reasoning capabilities. The relevant paper results are:OpenCodeReasoning: Advancing Data Distillation for Competitive Coding".

The dataset contains 735,255 samples, covering 28,319 unique programming questions, and is one of the largest reasoning programming datasets currently available.

Data source:

  • It integrates questions from 11 mainstream programming platforms, including CodeForces, CodeChef, LeetCode, and public data sets such as TACO, APPS, and CodeContests.
  • The code response is generated by NVIDIA's self-developed model R1 to ensure data consistency and standardization of reasoning logic.
OpenCodeReasoning.torrent
Seeding 1Downloading 0Completed 22Total Downloads 45
  • OpenCodeReasoning/
    • README.md
      1.49 KB
    • README.txt
      2.98 KB
      • data/
        • OpenCodeReasoning.zip
          8.11 GB