HyperAI

ChemBench4K Chemical Ability Evaluation Benchmark Dataset

Date

10 months ago

Size

509.65 KB

Organization

Shanghai Artificial Intelligence Laboratory

Publish URL

huggingface.co

* This dataset supports online use.Click here to jump.

This dataset was open-sourced by the Shanghai Artificial Intelligence Laboratory in 2024 together with its first scientific big model, the Pu Ke Chemical Big Model (ChemLLM). The related paper results are "ChemLLM: A Chemical Large Language Model".

The dataset mainly includes ChemBench-4K, and the research team also open-sourced Chinese and English versions of ChemData700K, ChemPref-10K and C-MHChem dataset.

Background

Large-scale chemistry performance benchmarks Most of the existing chemistry task performance benchmarks are designed for expert models for specific tasks, such as MoleculeNet. However, they may not be suitable for testing LLMs. Most of the existing large-scale language model benchmarks for chemistry are in the form of question answering and use BLEU and ROUGE as evaluation criteria. However, these types of evaluations may be significantly affected by the output style of the language model and are not suitable for scenarios that emphasize the correctness of scientific facts. In this case, if the answers exhibit a similar language style, they can even obtain higher evaluation scores despite containing factual errors. Therefore, the research team chose to construct a chemistry benchmark consisting of multiple-choice questions, similar to the current mainstream evaluation sets MMLU 30 and C-Eval.

Dataset Overview

To rigorously evaluate the language model's understanding of chemistry, the research team launched ChemBench, an innovative benchmark consisting of nine tasks about chemical molecules and reactions, the same as those in ChemData, with 4,100 multiple-choice questions with one correct answer. This benchmark lays the foundation for objectively measuring the chemistry level of large language models.

The distribution of all tasks in ChemBench is shown in the figure.

 

Introduction to other open source datasets

Click here to use the Chinese and English versions of ChemData700K, ChemPref-10K and C-MHChem datasets

ChemBench-4K Dataset

ChemData700K is a large language model chemistry capability instruction fine-tuning dataset containing nine core chemistry tasks and 730K high-quality questions and answers, sampled from 1/10 of the seven million data. The dataset covers a wide range of chemical domain knowledge and follows three main task categories (molecules, reactions, and fields).

ChemPref-10K dataset

This dataset can be used to optimize language models to match human preferences and contains both English and Chinese versions.

C-MHChem dataset

C-MHChem is a high-quality, fully manually written, multiple-choice test benchmark consisting of 600 questions collected from junior high school, high school, and college entrance examinations in various parts of China over the past 25 years.

ChemBench4K.torrent
Seeding 2Downloading 0Completed 90Total Downloads 176
  • ChemBench4K/
    • README.md
      3.08 KB
    • README.txt
      6.17 KB
      • data/
        • ChemBench4K.zip
          509.65 KB