VidSTG Large-Scale Video Grounding Dataset
Date
3 years ago
Publish URL
License
其他
Categories

The VidSTG dataset is a spatio-temporal video grounding dataset built on the VidOR dataset. VidOR is a video relation dataset containing 7,000, 835, and 2,165 videos for training, validation, and testing, respectively. The goal of the spatio-temporal video grounding task is to locate the spatio-temporal part of an uncut video that matches a given sentence describing the target.