Command Palette
Search for a command to run...
Nemotron-Math-v2 Mathematical Inference Dataset
Nemotron-Math-v2 is a mathematical inference dataset released by NVIDIA Corporation in 2025. Related research papers include... Nemotron-Math: Efficient Long-Context Distillation of Mathematical Reasoning from Multi-Mode Supervision It is primarily used to train LLMs to perform structured mathematical reasoning, to study the differences between tool-enhanced reasoning and pure language reasoning, and to build long-context or multi-track reasoning systems.
This dataset contains approximately 347,000 high-quality mathematical problems and 7 million model-generated inference trajectories. Each problem is solved in six configurations: high/medium/low inference depth and with or without Python TIR, and the answers are validated via a pipeline using an LLM as the arbiter.
Data Fields:
- Problem: Problem statements extracted from sources such as OpenMathReasoning and MathStackExchange.
- Messages: The user's and assistant's conversation log, used for LLM training.
- expected_answer: The extracted answer or the majority vote answer generated by the model.
- metadata: Pass rate under different reasoning and tool usage scenarios
- data_source: Data source is AoPS or StackExchange-Math
- tool: The tool definition used, or empty.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.