8 months ago

Multi-Task Learning

Method/Architecture

Kevin Qinghong Lin Pengchuan Zhang Joya Chen Shraman Pramanick Difei Gao Alex Jinpeng Wang Rui Yan Mike Zheng Shou

Abstract

Video Temporal Grounding (VTG), which aims to ground target clips from videos(such as consecutive intervals or disjoint shots) according to custom languagequeries (e.g., sentences or words), is key for video browsing on social media.Most methods in this direction develop taskspecific models that are trainedwith type-specific labels, such as moment retrieval (time interval) andhighlight detection (worthiness curve), which limits their abilities togeneralize to various VTG tasks and labels. In this paper, we propose to Unifythe diverse VTG labels and tasks, dubbed UniVTG, along three directions:Firstly, we revisit a wide range of VTG labels and tasks and define a unifiedformulation. Based on this, we develop data annotation schemes to createscalable pseudo supervision. Secondly, we develop an effective and flexiblegrounding model capable of addressing each task and making full use of eachlabel. Lastly, thanks to the unified framework, we are able to unlock temporalgrounding pretraining from large-scale diverse labels and develop strongergrounding abilities e.g., zero-shot grounding. Extensive experiments on threetasks (moment retrieval, highlight detection and video summarization) acrossseven datasets (QVHighlights, Charades-STA, TACoS, Ego4D, YouTube Highlights,TVSum, and QFVS) demonstrate the effectiveness and flexibility of our proposedframework. The codes are available at https://github.com/showlab/UniVTG.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

8 months ago

Multi-Task Learning

Method/Architecture

Kevin Qinghong Lin Pengchuan Zhang Joya Chen Shraman Pramanick Difei Gao Alex Jinpeng Wang Rui Yan Mike Zheng Shou

Abstract

Video Temporal Grounding (VTG), which aims to ground target clips from videos(such as consecutive intervals or disjoint shots) according to custom languagequeries (e.g., sentences or words), is key for video browsing on social media.Most methods in this direction develop taskspecific models that are trainedwith type-specific labels, such as moment retrieval (time interval) andhighlight detection (worthiness curve), which limits their abilities togeneralize to various VTG tasks and labels. In this paper, we propose to Unifythe diverse VTG labels and tasks, dubbed UniVTG, along three directions:Firstly, we revisit a wide range of VTG labels and tasks and define a unifiedformulation. Based on this, we develop data annotation schemes to createscalable pseudo supervision. Secondly, we develop an effective and flexiblegrounding model capable of addressing each task and making full use of eachlabel. Lastly, thanks to the unified framework, we are able to unlock temporalgrounding pretraining from large-scale diverse labels and develop strongergrounding abilities e.g., zero-shot grounding. Extensive experiments on threetasks (moment retrieval, highlight detection and video summarization) acrossseven datasets (QVHighlights, Charades-STA, TACoS, Ego4D, YouTube Highlights,TVSum, and QFVS) demonstrate the effectiveness and flexibility of our proposedframework. The codes are available at https://github.com/showlab/UniVTG.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

UniVTG: Towards Unified Video-Language Temporal Grounding | Papers | HyperAI