HyperAI

Video Narration Captioning is a sub-task in the field of computer vision that aims to predict the narration captions for each shot in a multi-shot video. This task introduces Automatic Speech Recognition (ASR) text as additional input, utilizing the same model architecture as single-shot video captioning, but with the prediction target being the narration captions. Video narration captions not only provide background knowledge but also reflect the commentator's perspective, offering significant value in understanding video content.

Shot2Story20K

Ours

HyperAI

Shot2Story20K

Ours

Command Palette

video narration captioning

Command Palette

video narration captioning

Command Palette

video narration captioning