ViTT Dense Video Description Dataset
Date
3 years ago
Publish URL
License
其他
Categories

ViTT stands for Video Timeline Tags, which consists of 8,169 videos with manually generated segment-level annotations. Among them, 5,840 videos are annotated once, and the rest are annotated twice or more. A total of 12,461 sets of annotations have been released for this dataset. The videos in this dataset come from the Youtube-8M dataset.