Date

3 years ago

Organization

Publish URL

github.com

Paper URL

arxiv.org

License

Other

Tags

Video Captioning

Video Understanding

ViTT stands for Video Timeline Tags, which consists of 8,169 videos with manually generated segment-level annotations. Among them, 5,840 videos are annotated once, and the rest are annotated twice or more. A total of 12,461 sets of annotations have been released for this dataset. The videos in this dataset come from the Youtube-8M dataset.

This dataset is contributed by community users and is intended for educational and informational purposes only. If any content involves copyright infringement, please contact us at [email protected] for prompt review and removal.

Related Datasets

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

Discuss on Discord

Date

3 years ago

Organization

Publish URL

github.com

Paper URL

arxiv.org

License

Other

Related Datasets

olmOCR-mix-1025 Document Recognition Dataset

3 months ago

71.74 GB82

VAP-Data Visual Action Performance Dataset

2 months ago

MUVR Multimodal Uncropped Video Retrieval Benchmark

2 months ago

VideoRewardBench Video Reward Model Evaluation Dataset

2 months ago

MCD-rPPG Multi-Camera Remote Photoplethysmography Dataset

a month ago

Camera Clone Multi-view Dataset

2 months ago

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

ViTT Dense Video Description Dataset

Build AI with AI

HyperAI Newsletters

Command Palette

ViTT Dense Video Description Dataset

Related Datasets

olmOCR-mix-1025 Document Recognition Dataset

VAP-Data Visual Action Performance Dataset

MUVR Multimodal Uncropped Video Retrieval Benchmark

VideoRewardBench Video Reward Model Evaluation Dataset

MCD-rPPG Multi-Camera Remote Photoplethysmography Dataset

Camera Clone Multi-view Dataset

Build AI with AI

HyperAI Newsletters

Command Palette

ViTT Dense Video Description Dataset

Related Datasets

olmOCR-mix-1025 Document Recognition Dataset

VAP-Data Visual Action Performance Dataset

MUVR Multimodal Uncropped Video Retrieval Benchmark

VideoRewardBench Video Reward Model Evaluation Dataset

MCD-rPPG Multi-Camera Remote Photoplethysmography Dataset

Camera Clone Multi-view Dataset

Build AI with AI

HyperAI Newsletters

Related Datasets

olmOCR-mix-1025 Document Recognition Dataset

VAP-Data Visual Action Performance Dataset

MUVR Multimodal Uncropped Video Retrieval Benchmark

VideoRewardBench Video Reward Model Evaluation Dataset

MCD-rPPG Multi-Camera Remote Photoplethysmography Dataset

Camera Clone Multi-view Dataset

Related Datasets

olmOCR-mix-1025 Document Recognition Dataset

VAP-Data Visual Action Performance Dataset

MUVR Multimodal Uncropped Video Retrieval Benchmark

VideoRewardBench Video Reward Model Evaluation Dataset

MCD-rPPG Multi-Camera Remote Photoplethysmography Dataset

Camera Clone Multi-view Dataset