HyperAIHyperAI
2 months ago

Saliency-Guided DETR for Moment Retrieval and Highlight Detection

Gordeev, Aleksandr ; Dokholyan, Vladimir ; Tolstykh, Irina ; Kuprashevich, Maksim
Saliency-Guided DETR for Moment Retrieval and Highlight Detection
Abstract

Existing approaches for video moment retrieval and highlight detection arenot able to align text and video features efficiently, resulting inunsatisfying performance and limited production usage. To address this, wepropose a novel architecture that utilizes recent foundational video modelsdesigned for such alignment. Combined with the introduced Saliency-Guided CrossAttention mechanism and a hybrid DETR architecture, our approach significantlyenhances performance in both moment retrieval and highlight detection tasks.For even better improvement, we developed InterVid-MR, a large-scale andhigh-quality dataset for pretraining. Using it, our architecture achievesstate-of-the-art results on the QVHighlights, Charades-STA and TACoSbenchmarks. The proposed approach provides an efficient and scalable solutionfor both zero-shot and fine-tuning scenarios in video-language tasks.

Saliency-Guided DETR for Moment Retrieval and Highlight Detection | Latest Papers | HyperAI