HyperAIHyperAI
2 months ago

Live Video Captioning

Blanco-Fernández, Eduardo ; Gutiérrez-Álvarez, Carlos ; Nasri, Nadia ; Maldonado-Bascón, Saturnino ; López-Sastre, Roberto J.
Live Video Captioning
Abstract

Dense video captioning involves detecting and describing events within videosequences. Traditional methods operate in an offline setting, assuming theentire video is available for analysis. In contrast, in this work we introducea groundbreaking paradigm: Live Video Captioning (LVC), where captions must begenerated for video streams in an online manner. This shift brings uniquechallenges, including processing partial observations of the events and theneed for a temporal anticipation of the actions. We formally define the novelproblem of LVC and propose innovative evaluation metrics specifically designedfor this online scenario, demonstrating their advantages over traditionalmetrics. To address the novel complexities of LVC, we present a new model thatcombines deformable transformers with temporal filtering, enabling effectivecaptioning over video streams. Extensive experiments on the ActivityNetCaptions dataset validate the proposed approach, showcasing its superiorperformance in the LVC setting compared to state-of-the-art offline methods. Tofoster further research, we provide the results of our model and an evaluationtoolkit with the new metrics integrated at: https://github.com/gramuah/lvc.

Live Video Captioning | Latest Papers | HyperAI