TVQA Video Question Answering Dataset
Date
3 years ago
Publish URL
License
其他
Categories

The TVQA dataset is a large-scale video question-answering dataset, and the videos come from 6 classic American TV series. The dataset contains about 152.5K question-answer pairs, which come from 21.8K video clips with a duration of 60-90 seconds, with a total length of more than 460 hours. These question-answer pairs are used for training, validation, and test sets in a ratio of 8:1:1.
The questions in the TVQA dataset are designed in a combined way, including question-answering and localization, and each question has temporal localization. Answering such questions requires the model to have a certain temporal localization and the ability to understand dialogue (subtitle) and video.