HyperAIHyperAI

Command Palette

Search for a command to run...

OpenMathInstruct-2 Math Instruction Tuning Dataset

Date

a year ago

Size

10.23 GB

Organization

NVIDIA

Paper URL

arxiv.org

OpenMathInstruct-2 is a large-scale open source math instruction dataset released by NVIDIA in 2024, which aims to accelerate the progress of artificial intelligence in mathematics. The related paper results are "OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction DataThe dataset contains 14 million question-answer pairs (about 600,000 unique questions), which is nearly 8 times larger than the previous largest dataset of its kind. By fine-tuning the Llama-3.1-8B-Base model with OpenMathInstruct-2, its performance on the MATH dataset is improved by 15.9% over Llama3.1-8B-Instruct (from 51.9% to 67.8%).

The OpenMathInstruct-2 dataset contains the following fields:

  • problem: Original problems, either from the GSM8K or MATH training sets, or problems augmented from these training sets.
  • generated_solution: The synthetically generated solution.
  • expected_answer: For questions in the training set, it is the true reference answer provided in the dataset. For augmented questions, it is the answer obtained by majority vote.
  • problem_source: Indicates that the problem is directly from GSM8K or MATH, or is an enhanced version derived from either dataset.
Example of dataset structure

OpenMathInstruct-2.torrent
Seeding 1Downloading 0Completed 136Total Downloads 137
  • OpenMathInstruct-2/
    • README.md
      1.85 KB
    • README.txt
      3.7 KB
      • data/
        • OpenMathInstruct-2.zip
          10.23 GB

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp