Temporal Sentence Grounding
Temporal Sentence Grounding (TSG) is a sub-task in the field of computer vision that aims to locate specific moments in untrimmed videos through given natural language queries. This task leverages supervision information at different levels, including weak supervision (a set of video-level action categories), semi-weak supervision (a set of video-level action categories and a few timestamped action annotations), and full supervision (all action categories and time intervals annotated in the untrimmed video), to improve localization accuracy and generalization capabilities. TSG has significant application value for video retrieval, content understanding, and human-computer interaction.