HyperAI

CapsFusion-120M Multimodal Image and Text Dataset

Date

a year ago

Size

23.21 GB

Organization

Tsinghua University

Publish URL

github.com

特色图像

This dataset is a multimodal image and text dataset launched by Tsinghua University and BAAI in 2024. "CapsFusion: Rethinking Image-Text Data at Scale"It has been accepted by CVPR 2024.

This dataset is a high-quality resource for large-scale multimodal pre-training. This version contains corresponding captions from the LAION-2B and LAION-COCO datasets, which facilitates comparative analysis and further in-depth research on the quality of image-text data.

Each data entry has four fields:

  • Image URL
  • LAION-2B Title (original alternative text from the web)
  • LAION-COCO subtitles (synthesized by BLIP)
  • CapsFusion Title (Research Team)
CapsFusion-120M.torrent
Seeding 1Downloading 1Completed 77Total Downloads 148
  • CapsFusion-120M/
    • README.md
      1.34 KB
    • README.txt
      2.69 KB
      • data/
        • CapsFusion-120M.zip
          23.21 GB