HyperAI

Video Visual Relation Detection

Video Visual Relation Detection (VidVRD) is a sub-task in the field of computer vision aimed at detecting instances of visual relations of interest in videos. Each instance is represented by a relation triplet <subject, predicate, object> and its trajectory. Compared to static images, videos provide dynamic and temporally varying features, which help to capture more natural visual relations. However, due to the high accuracy requirements for object tracking and the diversity of relation representations, VidVRD is technically more challenging than visual relation detection in images. The application value of this task lies in its ability to deeply understand video content, supporting advanced scene analysis and action recognition.