MiraData: A Large-scale Video Dataset With Long Duration and Structured Captions
Date
Size
Publish URL
Categories
* This dataset supports online use.Click here to jump.
MiraData is a large video dataset jointly developed by Tencent PCG ARC Lab and the Chinese University of Hong Kong in 2024. It is designed for long video generation tasks. The paper results are "MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions"
Different from previous short video clips, MiraData focuses on uncut video clips of 1 to 2 minutes (average duration 72.1 seconds), and each video is equipped with structured descriptions from different angles, with an average description length of 318 words, ensuring a comprehensive presentation of the video content. It includes six types of descriptions: subject description, background, style, camera movement, short description, and dense description, which enhances the description depth of the dataset.
To ensure high-quality clips, the research team filtered the dataset into five subsets based on aesthetics, motion intensity, and color, selecting clips with high visual quality and strong motion intensity. To obtain detailed and accurate descriptions, the research team first generated short subtitles using a state-of-the-art subtitle generator, and then enriched them using GPT-4V to generate dense subtitles. In order to provide fine-grained video descriptions from multiple perspectives.
The MiraData dataset provides valuable resources and new challenges for researchers in the fields of long video generation, video content understanding and generation.