Date

2 years ago

Size

509.65 KB

Organization

Paper URL

Background

Large-scale chemistry performance benchmarks Most of the existing chemistry task performance benchmarks are designed for expert models for specific tasks, such as MoleculeNet. However, they may not be suitable for testing LLMs. Most of the existing large-scale language model benchmarks for chemistry are in the form of question answering and use BLEU and ROUGE as evaluation criteria. However, these types of evaluations may be significantly affected by the output style of the language model and are not suitable for scenarios that emphasize the correctness of scientific facts. In this case, if the answers exhibit a similar language style, they can even obtain higher evaluation scores despite containing factual errors. Therefore, the research team chose to construct a chemistry benchmark consisting of multiple-choice questions, similar to the current mainstream evaluation sets MMLU 30 and C-Eval.

Dataset Overview

To rigorously evaluate the language model's understanding of chemistry, the research team launched ChemBench, an innovative benchmark consisting of nine tasks about chemical molecules and reactions, the same as those in ChemData, with 4,100 multiple-choice questions with one correct answer. This benchmark lays the foundation for objectively measuring the chemistry level of large language models. The distribution of all tasks in ChemBench is shown in the figure.

Introduction to other open source datasets

Click here to use the Chinese and English versions of ChemData700K, ChemPref-10K and C-MHChem datasets

ChemBench-4K Dataset

ChemData700K is a large language model chemistry capability instruction fine-tuning dataset containing nine core chemistry tasks and 730K high-quality questions and answers, sampled from 1/10 of the seven million data. The dataset covers a wide range of chemical domain knowledge and follows three main task categories (molecules, reactions, and fields).

ChemPref-10K dataset

This dataset can be used to optimize language models to match human preferences and contains both English and Chinese versions.

C-MHChem dataset

C-MHChem is a high-quality, fully manually written, multiple-choice test benchmark consisting of 600 questions collected from junior high school, high school, and college entrance examinations in various parts of China over the past 25 years.

ChemBench4K.torrent

Seeding 1Downloading 0Completed 200Total Downloads 372

ChemBench4K/
- README.md
  3.08 KB
- README.txt
  6.17 KB

This dataset is contributed by community users and is intended for educational and informational purposes only. If any content involves copyright infringement, please contact us at [email protected] for prompt review and removal.

Related Datasets

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

Use this Dataset

Discuss on Discord

Date

2 years ago

Size

509.65 KB

Organization

Paper URL

arxiv.org

Background

Dataset Overview

Introduction to other open source datasets

Click here to use the Chinese and English versions of ChemData700K, ChemPref-10K and C-MHChem datasets

ChemBench-4K Dataset

ChemPref-10K dataset

This dataset can be used to optimize language models to match human preferences and contains both English and Chinese versions.

C-MHChem dataset

ChemBench4K.torrent

Seeding 1Downloading 0Completed 200Total Downloads 372

ChemBench4K/
- README.md
  3.08 KB
- README.txt
  6.17 KB

Related Datasets

THINGS-EEG EEG Dataset

5 months ago

THINGS-MEG Magnetoencephalography Dataset

5 months ago

THINGS-fMRI Functional Magnetic Resonance Imaging Dataset

5 months ago

RoVid-X Robot Video Generation Dataset

2 months ago

LightOnOCR-mix-0126 Text Transcription Dataset

5 months ago

RealTimeFaceSwap-10k Video Call Spoofing Dataset

5 months ago

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

ChemBench4K Chemical Ability Evaluation Benchmark Dataset

Background

Dataset Overview

Introduction to other open source datasets

ChemBench-4K Dataset

ChemPref-10K dataset

C-MHChem dataset

Build AI with AI

HyperAI Newsletters

Command Palette

ChemBench4K Chemical Ability Evaluation Benchmark Dataset

Background

Dataset Overview

Introduction to other open source datasets

ChemBench-4K Dataset

ChemPref-10K dataset

C-MHChem dataset

Related Datasets

THINGS-EEG EEG Dataset

THINGS-MEG Magnetoencephalography Dataset

THINGS-fMRI Functional Magnetic Resonance Imaging Dataset

RoVid-X Robot Video Generation Dataset

LightOnOCR-mix-0126 Text Transcription Dataset

RealTimeFaceSwap-10k Video Call Spoofing Dataset

Build AI with AI

HyperAI Newsletters

Command Palette

ChemBench4K Chemical Ability Evaluation Benchmark Dataset

Background

Dataset Overview

Introduction to other open source datasets

ChemBench-4K Dataset

ChemPref-10K dataset

C-MHChem dataset

Related Datasets

THINGS-EEG EEG Dataset

THINGS-MEG Magnetoencephalography Dataset

THINGS-fMRI Functional Magnetic Resonance Imaging Dataset

RoVid-X Robot Video Generation Dataset

LightOnOCR-mix-0126 Text Transcription Dataset

RealTimeFaceSwap-10k Video Call Spoofing Dataset

Build AI with AI

HyperAI Newsletters

Related Datasets

THINGS-EEG EEG Dataset

THINGS-MEG Magnetoencephalography Dataset

THINGS-fMRI Functional Magnetic Resonance Imaging Dataset

RoVid-X Robot Video Generation Dataset

LightOnOCR-mix-0126 Text Transcription Dataset

RealTimeFaceSwap-10k Video Call Spoofing Dataset

Related Datasets

THINGS-EEG EEG Dataset

THINGS-MEG Magnetoencephalography Dataset

THINGS-fMRI Functional Magnetic Resonance Imaging Dataset

RoVid-X Robot Video Generation Dataset

LightOnOCR-mix-0126 Text Transcription Dataset

RealTimeFaceSwap-10k Video Call Spoofing Dataset