VL-Health Medical Reasoning Generation Dataset
Date
Publish URL
VL-Health is the first comprehensive dataset for medical multimodal understanding and generation, released in 2025 by Zhejiang University, the University of Electronic Science and Technology of China and other teams. The relevant paper results are:HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation".
The dataset integrates 765,000 comprehension task samples and 783,000 generation task samples, covering 11 medical modalities (including CT, MRI, X-ray, OCT, etc.) and multiple disease scenarios (from lung diseases to brain tumors).
Understanding the task:
VL-Health integrates professional data sets such as VQA-RAD (radiology questions), SLAKE (semantic annotation knowledge enhancement), PathVQA (pathology question and answer), and supplements large-scale multimodal data such as LLaVA-Med and PubMedVision to ensure that the model learns the full chain capabilities from basic image recognition to complex pathology reasoning.
Generate tasks:
The generation tasks mainly focus on the following four directions:
- Modal Conversion:Based on the CT-MRI paired data of SynthRAD2023, the inter-modality conversion capability of the model is trained;
- Super Resolution:Using high-resolution brain MRI from the IXI dataset to improve the accuracy of image detail reconstruction;
- Text-Image Generation:X-ray images and reports based on MIMIC-CXR, realizing the generation from text description to image;
- Image reconstruction:Adapted the LLaVA-558k dataset to train the model's image encoding-decoding capabilities.

Dataset classification