HyperAIHyperAI

Command Palette

Search for a command to run...

Video Narrative Grounding

Video Narrative Grounding is a task that links visual and linguistic information, aiming to associate video narratives with specific video segments. This task takes as input a video containing text descriptions and the positions of nouns marked within these descriptions, and requires generating segmentation masks for the corresponding target objects of each marked noun in every frame. By accurately locating objects within videos, Video Narrative Grounding has significant application value in areas such as multimodal understanding, video annotation, and content retrieval.

No Data
No benchmark data available for this task
Video Narrative Grounding | SOTA | HyperAI