WIT Image-text Dataset
Date
3 years ago
Size
25.2 GB
Publish URL
License
其他
Categories

WIT, short for Wikipedia-based Image Text, is a large multimodal and multilingual dataset. The dataset consists of a curated collection of 37.6 million entity-rich image-text examples, including 11.5 million unique images in 108 Wikipedia languages. The scale of the dataset allows it to be used as a pre-training dataset for multimodal machine learning models.
WIT has four unique advantages:
- WIT is the largest multimodal dataset in terms of the number of image-text examples.
- Over 100 languages are covered (with at least 12,000 examples per language), and cross-lingual text is provided for many images.
- Relative to previous datasets, WIT represents a more diverse set of concepts and real-world entities.
- WIT provides a very challenging real-world test set.
WIT.torrent
Seeding 0Downloading 3Completed 382Total Downloads 560