HyperAIHyperAI

Command Palette

Search for a command to run...

ChemData Chemical Task Dataset

Date

a year ago

Size

242.89 MB

Organization

Shanghai Artificial Intelligence Laboratory

Paper URL

arxiv.org

* This dataset supports online use.Click here to jump.

Dataset Introduction

This dataset was open-sourced by the Shanghai Artificial Intelligence Laboratory in 2024 together with its first scientific big model, the Pu Ke Chemical Big Model (ChemLLM). The related paper results are "ChemLLM: A Chemical Large Language Model".

The data set mainly includes ChemData700K. The research team also open-sourced the Chinese and English versions of ChemBench-4K, ChemPref-10K and the C-MHChem data set.

ChemData700K dataset

ChemData700K is a large language model chemistry capability instruction fine-tuning dataset that includes 9 core chemistry tasks and 730K high-quality questions and answers, sampled from 1/10 of 7 million data. The dataset covers a wide range of chemical domain knowledge and is divided into 3 main task categories (molecules, reactions, and domains).

ChemBench4K benchmark dataset

ChemBench is an innovative benchmark consisting of 9 tasks on chemical molecules and reactions. These 9 tasks are the same as those in ChemData. The benchmark provides a basis for objectively measuring the chemistry proficiency of LLM students. ChemBench contains 4,100 multiple-choice questions with one correct answer.

ChemPref-10K dataset

This dataset can be used to optimize language models to match human preferences and contains both English and Chinese versions.

C-MHChem dataset

C-MHChem is a high-quality, fully manually written, multiple-choice test benchmark consisting of 600 questions collected from junior high school, high school, and college entrance examinations in various parts of China over the past 25 years.

ChemLLM-Dataset.torrent
Seeding 1Downloading 0Completed 208Total Downloads 761
  • ChemLLM-Dataset/
    • README.md
      2.09 KB
    • README.txt
      4.18 KB
      • data/
        • chem.zip
          242.89 MB

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp