Date

2 years ago

Size

53.54 GB

Organization

Publish URL

Paper URL

Tags

PubMedVision is a large-scale and high-quality medical multimodal dataset created in 2024 by a research team from Shenzhen Big Data Research Institute, the Chinese University of Hong Kong, and the National Health Data Institute. It contains 1.3 million medical VQA samples.HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale". This dataset uses sophisticated data processing methods to select medical-related images and informative image descriptions from papers in the international medical journal PubMed, effectively filtering out a large number of medical-irrelevant images and context-irrelevant content. In order to improve the alignment of image and text data, the research team used the large visual model (GPT-4V) to re-describe the images and constructed 10 scene dialogues, rewriting the image and text data into a question-and-answer format, which enhanced the learning of medical visual knowledge.

PubMedVision.torrent

Seeding 2Downloading 0Completed 266Total Downloads 756

PubMedVision/
- README.md
  1.46 KB
- README.txt
  2.93 KB

This dataset is contributed by community users and is intended for educational and informational purposes only. If any content involves copyright infringement, please contact us at [email protected] for prompt review and removal.

Related Datasets

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

Use this Dataset

Discuss on Discord

Date

2 years ago

Size

53.54 GB

Organization

Publish URL

github.com

Paper URL

arxiv.org

Related Datasets

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

PubMedVision Large-Scale Medical VQA Dataset

Build AI with AI

HyperAI Newsletters

Command Palette

PubMedVision Large-Scale Medical VQA Dataset

Related Datasets

DRACO Cross-Disciplinary Deep Research Benchmark Dataset

ToolACE Complex Tools Learning Dialogue Dataset

Student Mental Health and Burnout Dataset

THINGS-EEG EEG Dataset

THINGS-MEG Magnetoencephalography Dataset

THINGS-fMRI Functional Magnetic Resonance Imaging Dataset

RoVid-X Robot Video Generation Dataset

Chest X-ray Pneumonia Dataset

LightOnOCR-mix-0126 Text Transcription Dataset

Patient Churn Prediction Dataset

HydroBASINS Global River Partition Dataset

MCIF Multimodal Cross-Language Instruction Following Dataset

MCD-rPPG Multi-Camera Remote Photoplethysmography Dataset

Build AI with AI

HyperAI Newsletters

Command Palette

PubMedVision Large-Scale Medical VQA Dataset

Related Datasets

DRACO Cross-Disciplinary Deep Research Benchmark Dataset

ToolACE Complex Tools Learning Dialogue Dataset

Student Mental Health and Burnout Dataset

THINGS-EEG EEG Dataset

THINGS-MEG Magnetoencephalography Dataset

THINGS-fMRI Functional Magnetic Resonance Imaging Dataset

RoVid-X Robot Video Generation Dataset

Chest X-ray Pneumonia Dataset

LightOnOCR-mix-0126 Text Transcription Dataset

Patient Churn Prediction Dataset

HydroBASINS Global River Partition Dataset

MCIF Multimodal Cross-Language Instruction Following Dataset

MCD-rPPG Multi-Camera Remote Photoplethysmography Dataset

Build AI with AI

HyperAI Newsletters

Related Datasets

DRACO Cross-Disciplinary Deep Research Benchmark Dataset

ToolACE Complex Tools Learning Dialogue Dataset

Student Mental Health and Burnout Dataset

THINGS-EEG EEG Dataset

THINGS-MEG Magnetoencephalography Dataset

THINGS-fMRI Functional Magnetic Resonance Imaging Dataset

RoVid-X Robot Video Generation Dataset

Chest X-ray Pneumonia Dataset

LightOnOCR-mix-0126 Text Transcription Dataset

Patient Churn Prediction Dataset

HydroBASINS Global River Partition Dataset

MCIF Multimodal Cross-Language Instruction Following Dataset

MCD-rPPG Multi-Camera Remote Photoplethysmography Dataset

Related Datasets

DRACO Cross-Disciplinary Deep Research Benchmark Dataset

ToolACE Complex Tools Learning Dialogue Dataset

Student Mental Health and Burnout Dataset

THINGS-EEG EEG Dataset

THINGS-MEG Magnetoencephalography Dataset

THINGS-fMRI Functional Magnetic Resonance Imaging Dataset

RoVid-X Robot Video Generation Dataset

Chest X-ray Pneumonia Dataset

LightOnOCR-mix-0126 Text Transcription Dataset

Patient Churn Prediction Dataset

HydroBASINS Global River Partition Dataset

MCIF Multimodal Cross-Language Instruction Following Dataset

MCD-rPPG Multi-Camera Remote Photoplethysmography Dataset