Command Palette
Search for a command to run...
ShareGPT4V Large-scale high-quality Image and Text Dataset
Date
Size
Publish URL
Paper URL
License
CC BY-SA 4.0

The ShareGPT4V dataset is a high-quality dataset consisting of a large number of image-text pairs, which is used to train visual-language models (VLMs) to improve the model's capabilities in image understanding and text generation. The dataset contains 1.2 million image-text pairs that effectively align visual and language features, enhance the model's ability to follow instructions, and incorporate more academic tasks such as ScienceQA, TextVQA, SBU, etc. By introducing this dataset, the model has been significantly improved in image-text alignment capabilities, which is a key aspect for multimodal representation learning.
This dataset was released by the University of Science and Technology of China, Shanghai Artificial Intelligence Laboratory in 2023.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.