PubMedVision Large-Scale Medical VQA Dataset
Date
Size
Publish URL
Categories
* This dataset supports online use.Click here to jump.
PubMedVision is a large-scale and high-quality medical multimodal dataset created in 2024 by a research team from Shenzhen Big Data Research Institute, the Chinese University of Hong Kong, and the National Health Data Institute. It contains 1.3 million medical VQA samples.HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale".
This dataset uses sophisticated data processing methods to select medical-related images and informative image descriptions from papers in the international medical journal PubMed, effectively filtering out a large number of medical-irrelevant images and context-irrelevant content. In order to improve the alignment of image and text data, the research team used the large visual model (GPT-4V) to re-describe the images and constructed 10 scene dialogues, rewriting the image and text data into a question-and-answer format, which enhanced the learning of medical visual knowledge.