Command Palette
Search for a command to run...
HowToVQA69M Video Question Answering Dataset
Date
Size
Publish URL
Paper URL
License
Other

VQA stands for Visual question answering. HowToVQA69M is a video question answering dataset containing 69,270,581 questions and answers. Its scale is twice as large as the existing video question answering dataset VideoQA.
On average, each raw video generates 43 video clips, each 12.1 seconds long and associated with 1.2 questions and answers, with questions containing 8.7 words and answers containing 2.4 words. The HowToVQA69M dataset is highly diverse, containing more than 16 million unique answers, of which more than 2 million unique answers appear more than once and more than 300,000 unique answers appear more than 10 times.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.