Command Palette
Search for a command to run...
Video Narrative Grounding
Video Narrative Grounding is a task that links visual and linguistic information, aiming to associate video narratives with specific video segments. This task takes as input a video containing text descriptions and the positions of nouns marked within these descriptions, and requires generating segmentation masks for the corresponding target objects of each marked noun in every frame. By accurately locating objects within videos, Video Narrative Grounding has significant application value in areas such as multimodal understanding, video annotation, and content retrieval.