Date

2 years ago

Size

1.4 GB

Organization

Publish URL

Paper URL

Tags

The ProtT3 dataset was jointly constructed by research teams from the National University of Singapore, the University of Science and Technology of China, and Hokkaido University in 2024.ProtT3: Protein-to-Text Generation for Text-based Protein Understanding", and has been selected for ACL 2024. This dataset is a pre-training dataset for the paper research. The ProtT3 dataset consists of three datasets: Swiss-Prot, ProteinKG25 and PDB-QA.

As shown in the table above, Swiss-Prot is a protein sequence database with text annotations. The researchers processed the dataset and excluded the protein names from the text annotations to prevent information leakage. The generated text descriptions connect the annotations of protein function, location, and family. ProteinKG25 is a knowledge graph derived from the GeneOntology database. The researchers first aggregated triplets of the same protein and then filled the protein information into a predefined text template to convert its triplets into free text. PDB-QA is a protein single-round question-answering dataset derived from RCSB PDB2. It contains 30 question templates about protein structure, properties, and supplementary information. As shown in the table below, for fine-grained evaluation, researchers divided the questions into 4 categories based on the format of the answer (string or number) and the content focus (structure/property or supplementary information).

Citation

"`bib @inproceedings{liu2024prott, title={ProtT3: Protein-to-Text Generation for Text-based Protein Understanding}, author={Liu, Zhiyuan and Zhang, An and Fei, Hao and Zhang, Enzhi and Wang, Xiang and Kawaguchi, Kenji and Chua, Tat-Seng} booktitle={{ACL}}, publisher = {Association for Computational Linguistics}, year={2024}, url={https://openreview.net/forum?id=ZmIjOPil2b} }

ProtT3.torrent

Seeding 1Downloading 0Completed 246Total Downloads 386

ProtT3/
- README.md
  2.13 KB
- README.txt
  4.26 KB

This dataset is contributed by community users and is intended for educational and informational purposes only. If any content involves copyright infringement, please contact us at [email protected] for prompt review and removal.

Related Datasets

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

Use this Dataset

Discuss on Discord

Date

2 years ago

Size

1.4 GB

Organization

Publish URL

github.com

Paper URL

arxiv.org

Citation

ProtT3.torrent

Seeding 1Downloading 0Completed 246Total Downloads 386

ProtT3/
- README.md
  2.13 KB
- README.txt
  4.26 KB

Related Datasets

MAKIEVAL Multilingual Cultural Knowledge Assessment Dataset

10 hours ago

Verbatim Spans Query Condition Evidence Extraction Dataset

4 hours ago

SAM 3D Artist Objects 3D Object Reconstruction Dataset

5 days ago

FigureBench Scientific Illustration Generation Benchmark Dataset

7 days ago

ChartNet Chart Understanding Multimodal Dataset

a month ago

TACK Targeted Chimera Knowledge Base Dataset

22 days ago

SMOL Multilingual Translation Parallel Dataset

a month ago

chi-bench Medical Intelligent Agent Benchmark Evaluation Dataset

13 days ago

ViMU Video Metaphor Understanding Dataset

a month ago

QCalEval Quantum Calibration Graph Understanding Dataset

2 months ago

MDPBench Multilingual Document Parsing Benchmark Dataset

8 days ago

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

ProtT3 Protein Text Question Answering Dataset

Citation

Build AI with AI

HyperAI Newsletters

Command Palette

ProtT3 Protein Text Question Answering Dataset

Citation

Related Datasets

MAKIEVAL Multilingual Cultural Knowledge Assessment Dataset

Verbatim Spans Query Condition Evidence Extraction Dataset

SAM 3D Artist Objects 3D Object Reconstruction Dataset

FigureBench Scientific Illustration Generation Benchmark Dataset

ChartNet Chart Understanding Multimodal Dataset

TACK Targeted Chimera Knowledge Base Dataset

SMOL Multilingual Translation Parallel Dataset

chi-bench Medical Intelligent Agent Benchmark Evaluation Dataset

ViMU Video Metaphor Understanding Dataset

QCalEval Quantum Calibration Graph Understanding Dataset

MDPBench Multilingual Document Parsing Benchmark Dataset

Build AI with AI

HyperAI Newsletters

Command Palette

ProtT3 Protein Text Question Answering Dataset

Citation

Related Datasets

MAKIEVAL Multilingual Cultural Knowledge Assessment Dataset

Verbatim Spans Query Condition Evidence Extraction Dataset

SAM 3D Artist Objects 3D Object Reconstruction Dataset

FigureBench Scientific Illustration Generation Benchmark Dataset

ChartNet Chart Understanding Multimodal Dataset

TACK Targeted Chimera Knowledge Base Dataset

SMOL Multilingual Translation Parallel Dataset

chi-bench Medical Intelligent Agent Benchmark Evaluation Dataset

ViMU Video Metaphor Understanding Dataset

QCalEval Quantum Calibration Graph Understanding Dataset

MDPBench Multilingual Document Parsing Benchmark Dataset

Build AI with AI

HyperAI Newsletters

Related Datasets

MAKIEVAL Multilingual Cultural Knowledge Assessment Dataset

Verbatim Spans Query Condition Evidence Extraction Dataset

SAM 3D Artist Objects 3D Object Reconstruction Dataset

FigureBench Scientific Illustration Generation Benchmark Dataset

ChartNet Chart Understanding Multimodal Dataset

TACK Targeted Chimera Knowledge Base Dataset

SMOL Multilingual Translation Parallel Dataset

chi-bench Medical Intelligent Agent Benchmark Evaluation Dataset

ViMU Video Metaphor Understanding Dataset

QCalEval Quantum Calibration Graph Understanding Dataset

MDPBench Multilingual Document Parsing Benchmark Dataset

Related Datasets

MAKIEVAL Multilingual Cultural Knowledge Assessment Dataset

Verbatim Spans Query Condition Evidence Extraction Dataset

SAM 3D Artist Objects 3D Object Reconstruction Dataset

FigureBench Scientific Illustration Generation Benchmark Dataset

ChartNet Chart Understanding Multimodal Dataset

TACK Targeted Chimera Knowledge Base Dataset

SMOL Multilingual Translation Parallel Dataset

chi-bench Medical Intelligent Agent Benchmark Evaluation Dataset

ViMU Video Metaphor Understanding Dataset

QCalEval Quantum Calibration Graph Understanding Dataset

MDPBench Multilingual Document Parsing Benchmark Dataset