ShareGPT4V Large-scale High-quality Image and Text Dataset
Date
Size
Publish URL
License
CC BY-SA 4.0
Categories

The ShareGPT4V dataset is a high-quality dataset consisting of a large number of image-text pairs, which is used to train visual-language models (VLMs) to improve the model's capabilities in image understanding and text generation. The dataset contains 1.2 million image-text pairs that effectively align visual and language features, enhance the model's ability to follow instructions, and incorporate more academic tasks such as ScienceQA, TextVQA, SBU, etc. By introducing this dataset, the model has been significantly improved in image-text alignment capabilities, which is a key aspect for multimodal representation learning.
This dataset was released by the University of Science and Technology of China, Shanghai Artificial Intelligence Laboratory in 2023.