Weakly Supervised Temporal Action
Weakly-supervised temporal action localization is a sub-task in the field of computer vision that focuses on training using only video-level labels to identify and locate the specific times when actions occur in videos. The goal of this task is to precisely pinpoint the start and end times of actions by learning the overall features of the video, without relying on frame-level or segment-level annotations. This not only reduces the cost of data annotation but also enhances the model's generalization ability, making it valuable for large-scale video analysis and understanding.