HyperAI

DRfold2 RNA Structure Test Dataset

Date

a month ago

Organization

National University of Singapore
Download Help

*This dataset supports online use.Click here to jump.

The DRfold2 dataset was created by Professor Zhang Yang's team at the National University of Singapore in 2025. The related paper is titled "Ab initio RNA structure prediction with composite language model and denoised end-to-end learningThe dataset is an independent test dataset constructed to objectively evaluate the performance of DRfold2 in the study.

It contains 28 RNA structures with sequence length less than 400 nts and from the following 3 categories:

  • Latest RNA-Puzzles Target Sequences
  • RNA target sequences in the CASP15 competition
  • The most recently published RNA structures in the Protein Data Bank (PDB) database as of August 1, 2024

Notably, the researchers excluded large synthetic RNA structures from the CASP15 dataset because they deviate from RNA structures found in nature, which are the primary focus of functional analysis and drug design.

To ensure rigorous model evaluation, the training set only contains RNA structures published before 2024, and excludes RNAs with sequence similarity greater than 80% to the test dataset.