HyperAI

ViTT Dense Video Description Dataset

Date

3 years ago

Organization

Publish URL

github.com

License

其他

Categories

Download Help
特色图像

ViTT stands for Video Timeline Tags, which consists of 8,169 videos with manually generated segment-level annotations. Among them, 5,840 videos are annotated once, and the rest are annotated twice or more. A total of 12,461 sets of annotations have been released for this dataset. The videos in this dataset come from the Youtube-8M dataset.