HyperAI

LLM4Mat-Bench Crystal Structure Dataset

Date

a month ago

Organization

Princeton University
University of Toronto

Publish URL

github.com

Download Help

LLM4Mat-Bench is a multimodal language model evaluation dataset for material property prediction jointly created by Princeton University, University of Toronto and other institutions. The related paper results are "LLM4Mat-Bench: Benchmarking Large Language Models for Materials Property Prediction", which aims to evaluate the performance of large language models (LLMs) in material property prediction and material discovery tasks. The dataset contains about 1.97 million crystal structure samples from 10 public material databases, covering 45 different material physical and chemical properties. It is the largest benchmark to date for evaluating the performance of large language models (LLMs) for material property prediction.

LLM4Mat-Bench Statistics

Each record in the dataset is characterized by multiple input modalities, including crystal chemical composition, standard crystal structure file (CIF), and natural language description of crystal structure generated by Robocrystallographer tool. These modalities together constitute a comprehensive representation of the material, which is used to support LLMs input and learning in various task scenarios.

Total amount of data:

  • Crystal composition mode (Composition): about 4.7M tokens
  • Crystal structure mode (CIF): about 615.5M tokens
  • Text Descriptions: about 3.1B tokens

The process of building this dataset includes collecting original CIF files and material properties from multiple mainstream material databases, and automatically generating structural language descriptions based on crystal structures, thereby forming multi-modal, unified structure data samples. Each sample record contains the corresponding material ID, chemical formula, property values (such as band gap, formation energy, density, elastic modulus, etc.) and other information.

The core goal of LLM4Mat-Bench is to promote the cross-integration of materials science and natural language processing, and to promote the research and application development in the fields of task-specific model evaluation, attribute prediction, instruction fine-tuning, etc. Its multi-source, multi-modal, and large-scale characteristics make it an important reference benchmark in the research of material language models.