Date

4 years ago

Organization

Publish URL

Paper URL

Tags

Visual Question Answering

Video Understanding

How2QA is a video + language learning framework dataset. The dataset presents the same set of selected video clips to another set of AMT workers for multiple-choice question-answer annotation. Each worker is assigned a video clip and asked to write a question based on four prepared responses (one correct answer and three distracting answers). The video narration is hidden from the workers to ensure that the collected question-answer pairs are not affected by subtitles. The dataset contains 22,000 60-second clips and 44,007 question-answer pairs selected from 9,035 videos.

This dataset is contributed by community users and is intended for educational and informational purposes only. If any content involves copyright infringement, please contact us at [email protected] for prompt review and removal.