CapsFusion-120M Multimodal Image and Text Dataset
Date
a year ago
Size
23.21 GB
Publish URL
Categories

This dataset is a multimodal image and text dataset launched by Tsinghua University and BAAI in 2024. "CapsFusion: Rethinking Image-Text Data at Scale"It has been accepted by CVPR 2024.
This dataset is a high-quality resource for large-scale multimodal pre-training. This version contains corresponding captions from the LAION-2B and LAION-COCO datasets, which facilitates comparative analysis and further in-depth research on the quality of image-text data.
Each data entry has four fields:
- Image URL
- LAION-2B Title (original alternative text from the web)
- LAION-COCO subtitles (synthesized by BLIP)
- CapsFusion Title (Research Team)
CapsFusion-120M.torrent
Seeding 1Downloading 1Completed 77Total Downloads 148