HyperAI

MP-20-PXRD Atomic Materials Benchmark Dataset

Date

a month ago

Organization

Stanford University

Publish URL

github.com

Download Help

*This dataset supports online use.Click here to jump.

The MP-20-PXRD benchmark dataset was jointly proposed by Columbia University and Stanford University in 2025 to train PXRDnet, a generative artificial intelligence structure analysis method based on diffusion models, end-to-end. The related research was published in Nature Materials under the title "Ab initio structure solutions from nanocrystalline powder diffraction data via diffusion models".

The dataset consists of materials sampled from the Materials Project database, with a maximum of 20 atoms in the unit cell. It contains 45,229 materials, which are used for training, validation, and testing in the ratio of 90%, 7.5%, and 2.5%.