HyperAIHyperAI

Command Palette

Search for a command to run...

Nemotron-Pretraining-SFT-v1 Supervised fine-tuning Dataset

Date

2 months ago

Organization

NVIDIA

Paper URL

2508.14444

License

Other

Join the Discord Community

Nemotron-Pretraining-SFT-v1 is a synthetic generative dataset released by NVIDIA in 2025. The related paper is "NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model", which aims to enhance the model's capabilities in tasks such as instruction following, reasoning, code and general question answering.

This dataset is aimed at STEM, academic, logical reasoning and multilingual scenarios. It is expanded and generated from high-quality mathematics and science materials, and combines graduate-level academic texts with instructed and fine-tuned SFT data to construct complex multiple-choice questions and analytical questions (with complete answers/ideas), covering multiple tasks such as mathematics, coding, general knowledge and logical reasoning.

In the official statistics of Nemotron pre-training data, SFT-related categories (such as Math SFT, Code SFT, and General SFT) occupy a significant proportion, making it easy for users to filter the required subsets according to metadata for reproducible experiments.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp