CC12M Image-text Pairs Dataset
Date
3 years ago
Publish URL
License
其他
Categories

CC12M (Conceptual 12M) is an image-text pair dataset specifically designed for vision and language pre-training. The dataset contains 12 million image-text pairs. Compared with CC3M, this dataset performs better in long-tail visual recognition for multiple downstream tasks.