MSR-VTT Video Caption Dataset
Date
3 years ago
Size
8.08 GB
Publish URL
License
其他

MSR-VTT, the full name of Microsoft Research Video to Text, is a large-scale video captioning dataset for open domains.
The dataset includes 10,000 video clips from 20 categories, each with 20 English sentences annotated by Amazon Mechanical Turks. There are about 29,000 different words in all the captions. The standard split uses 6,513 clips for training, 497 clips for validation, and 2,990 clips for testing.
MSR-VTT.torrent
Seeding 2Downloading 1Completed 783Total Downloads 1,816