Date

6 months ago

Organization

Paper URL

7a783933efcc

License

Apache 2.0

Tags

Reasoning

Benchmarks

FrontierScience is a dataset for evaluating inference and scientific research tasks, released by OpenAI in 2025. Related papers include... FrontierScience: evaluating AI's ability to perform expert-level scientific tasksThe aim is to systematically evaluate the capabilities of large models in expert-level scientific reasoning and research sub-tasks. This dataset employs a design mechanism of "expert creation + two-layer task structure + automatic scoring mechanism," and is divided into two subsets, corresponding to two types of abilities: closed-ended precise reasoning and open-ended scientific research reasoning.

The Olympiad dataset was originally designed by medal winners and national team coaches from the International Physics, Chemistry and Biology Olympiads. The difficulty of the questions is comparable to top international competitions such as IPhO, IChO and IBO. It focuses on short-answer reasoning tasks and requires the model to output a single numerical value, an algebraic expression or a biological term that can be fuzzily matched, in order to ensure the verifiability of the results and the stability of the automatic evaluation.
The Research Dataset is written by PhD students, postdoctoral fellows, professors, and other active researchers. The questions simulate sub-problems that may be encountered in real scientific research, covering the three major fields of physics, chemistry, and biology. Each question is accompanied by a fine-grained score of 10 points to evaluate the model's performance in several key aspects, including modeling assumptions, reasoning paths, and intermediate conclusions, in addition to the correctness of the answer.

This dataset is contributed by community users and is intended for educational and informational purposes only. If any content involves copyright infringement, please contact us at [email protected] for prompt review and removal.

Related Datasets

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

Use this Dataset Discuss on Discord

Date

6 months ago

Organization

Paper URL

7a783933efcc

License

Apache 2.0

Related Datasets

Creative Professionals Creative Task Instruction Dataset

2 months ago

THINGS-MEG Magnetoencephalography Dataset

5 months ago

THINGS-fMRI Functional Magnetic Resonance Imaging Dataset

5 months ago

Nemotron-Math-v2 Mathematical Inference Dataset

5 months ago

MCIF Multimodal Cross-Language Instruction Following Dataset

5 months ago

TxT360-3efforts Multi-Task Inference Dataset

5 months ago

LongBench-Pro Long Context Comprehensive Evaluation Dataset

6 months ago

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

FrontierScience Inference Research Task Evaluation Dataset

Build AI with AI

HyperAI Newsletters

Command Palette

FrontierScience Inference Research Task Evaluation Dataset

Related Datasets

Creative Professionals Creative Task Instruction Dataset

THINGS-MEG Magnetoencephalography Dataset

THINGS-fMRI Functional Magnetic Resonance Imaging Dataset

Nemotron-Math-v2 Mathematical Inference Dataset

MCIF Multimodal Cross-Language Instruction Following Dataset

TxT360-3efforts Multi-Task Inference Dataset

LongBench-Pro Long Context Comprehensive Evaluation Dataset

Build AI with AI

HyperAI Newsletters

Command Palette

FrontierScience Inference Research Task Evaluation Dataset

Related Datasets

Creative Professionals Creative Task Instruction Dataset

THINGS-MEG Magnetoencephalography Dataset

THINGS-fMRI Functional Magnetic Resonance Imaging Dataset

Nemotron-Math-v2 Mathematical Inference Dataset

MCIF Multimodal Cross-Language Instruction Following Dataset

TxT360-3efforts Multi-Task Inference Dataset

LongBench-Pro Long Context Comprehensive Evaluation Dataset

Build AI with AI

HyperAI Newsletters

Related Datasets

Creative Professionals Creative Task Instruction Dataset

THINGS-MEG Magnetoencephalography Dataset

THINGS-fMRI Functional Magnetic Resonance Imaging Dataset

Nemotron-Math-v2 Mathematical Inference Dataset

MCIF Multimodal Cross-Language Instruction Following Dataset

TxT360-3efforts Multi-Task Inference Dataset

LongBench-Pro Long Context Comprehensive Evaluation Dataset

Related Datasets

Creative Professionals Creative Task Instruction Dataset

THINGS-MEG Magnetoencephalography Dataset

THINGS-fMRI Functional Magnetic Resonance Imaging Dataset

Nemotron-Math-v2 Mathematical Inference Dataset

MCIF Multimodal Cross-Language Instruction Following Dataset

TxT360-3efforts Multi-Task Inference Dataset

LongBench-Pro Long Context Comprehensive Evaluation Dataset