Date

2 years ago

Size

260.89 MB

Organization

Publish URL

github.com

Paper URL

arxiv.org

Tags

LLM

Natural Language Processing

Protein

Biomolecules

Mol-Instructions is a large-scale biomolecular instruction dataset designed for large language models. It was created by a research team from Zhejiang University in 2024. The related paper results are "Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models", has been accepted by ICLR 2024. The dataset contains three types of instructions: molecule-oriented instructions, protein-oriented instructions, and biomolecule text instructions. It aims to provide rich instruction data to enhance the understanding and prediction capabilities of large language models in the biomolecule field. The molecule-oriented instructions contain 148,400 instructions, covering the basic properties and behaviors of small molecules, involving a variety of chemical reactions and molecular design tasks. The protein-oriented instructions contain 505,000 instructions, involving protein structure, function and activity prediction, as well as protein design based on text instructions. The biomolecule text instructions contain 53,000 instructions, mainly used for natural language processing tasks in the fields of bioinformatics and cheminformatics.

Mol-Instructions.torrent

Seeding 1Downloading 0Completed 128Total Downloads 210

Mol-Instructions/
- README.md
  1.69 KB
- README.txt
  3.39 KB

This dataset is contributed by community users and is intended for educational and informational purposes only. If any content involves copyright infringement, please contact us at [email protected] for prompt review and removal.

Related Datasets

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

Use this Dataset

Discuss on Discord

Date

2 years ago

Size

260.89 MB

Organization

Publish URL

github.com

Paper URL

arxiv.org

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Mol-Instructions Large-scale Biomolecular Instruction Dataset

Build AI with AI

HyperAI Newsletters

Command Palette

Mol-Instructions Large-scale Biomolecular Instruction Dataset

Related Datasets

DRACO Cross-Disciplinary Deep Research Benchmark Dataset

Nemotron Personas France (French Synthetic Personas Dataset)

Student Mental Health and Burnout Dataset

Groundsource Global Flood Events Dataset

CHIMERA General Inference Synthetic Dataset

Pan-Cancer scRNA-Seq Cancer Single-Cell Transcriptional Atlas Dataset

THINGS-EEG EEG Dataset

THINGS-MEG Magnetoencephalography Dataset

THINGS-fMRI Functional Magnetic Resonance Imaging Dataset

RubricHub_v1 Multi-Domain Generative Task Dataset

Nemotron-Personas-Brazil Brazilian Synthetic Character Dataset

CL-bench Context Learning Evaluation Benchmark Dataset

RoVid-X Robot Video Generation Dataset

LightOnOCR-mix-0126 Text Transcription Dataset

Patient Segmentation Dataset

GroundingME Complex Scene Understanding Evaluation Dataset

MCIF Multimodal Cross-Language Instruction Following Dataset

TxT360-3efforts Multi-Task Inference Dataset

X-ray Contraband Detection Dataset

Build AI with AI

HyperAI Newsletters

Command Palette

Mol-Instructions Large-scale Biomolecular Instruction Dataset

Related Datasets

DRACO Cross-Disciplinary Deep Research Benchmark Dataset

Nemotron Personas France (French Synthetic Personas Dataset)

Student Mental Health and Burnout Dataset

Groundsource Global Flood Events Dataset

CHIMERA General Inference Synthetic Dataset

Pan-Cancer scRNA-Seq Cancer Single-Cell Transcriptional Atlas Dataset

THINGS-EEG EEG Dataset

THINGS-MEG Magnetoencephalography Dataset

THINGS-fMRI Functional Magnetic Resonance Imaging Dataset

RubricHub_v1 Multi-Domain Generative Task Dataset

Nemotron-Personas-Brazil Brazilian Synthetic Character Dataset

CL-bench Context Learning Evaluation Benchmark Dataset

RoVid-X Robot Video Generation Dataset

LightOnOCR-mix-0126 Text Transcription Dataset

Patient Segmentation Dataset

GroundingME Complex Scene Understanding Evaluation Dataset

MCIF Multimodal Cross-Language Instruction Following Dataset

TxT360-3efforts Multi-Task Inference Dataset

X-ray Contraband Detection Dataset

Build AI with AI

HyperAI Newsletters

Related Datasets

DRACO Cross-Disciplinary Deep Research Benchmark Dataset

Nemotron Personas France (French Synthetic Personas Dataset)

Student Mental Health and Burnout Dataset

Groundsource Global Flood Events Dataset

CHIMERA General Inference Synthetic Dataset

Pan-Cancer scRNA-Seq Cancer Single-Cell Transcriptional Atlas Dataset

THINGS-EEG EEG Dataset

THINGS-MEG Magnetoencephalography Dataset

THINGS-fMRI Functional Magnetic Resonance Imaging Dataset

RubricHub_v1 Multi-Domain Generative Task Dataset

Nemotron-Personas-Brazil Brazilian Synthetic Character Dataset

CL-bench Context Learning Evaluation Benchmark Dataset

RoVid-X Robot Video Generation Dataset

LightOnOCR-mix-0126 Text Transcription Dataset

Patient Segmentation Dataset

GroundingME Complex Scene Understanding Evaluation Dataset

MCIF Multimodal Cross-Language Instruction Following Dataset

TxT360-3efforts Multi-Task Inference Dataset

X-ray Contraband Detection Dataset

Related Datasets

DRACO Cross-Disciplinary Deep Research Benchmark Dataset

Nemotron Personas France (French Synthetic Personas Dataset)

Student Mental Health and Burnout Dataset

Groundsource Global Flood Events Dataset

CHIMERA General Inference Synthetic Dataset

Pan-Cancer scRNA-Seq Cancer Single-Cell Transcriptional Atlas Dataset

THINGS-EEG EEG Dataset