Command Palette
Search for a command to run...
Nemotron-Pretraining-SFT-v1 Supervised fine-tuning Dataset
Date
Paper URL
License
Other
Nemotron-Pretraining-SFT-v1 is a synthetic generative dataset released by NVIDIA in 2025. The related paper is "NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model", which aims to enhance the model's capabilities in tasks such as instruction following, reasoning, code and general question answering.
This dataset is aimed at STEM, academic, logical reasoning and multilingual scenarios. It is expanded and generated from high-quality mathematics and science materials, and combines graduate-level academic texts with instructed and fine-tuned SFT data to construct complex multiple-choice questions and analytical questions (with complete answers/ideas), covering multiple tasks such as mathematics, coding, general knowledge and logical reasoning.
In the official statistics of Nemotron pre-training data, SFT-related categories (such as Math SFT, Code SFT, and General SFT) occupy a significant proportion, making it easy for users to filter the required subsets according to metadata for reproducible experiments.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.