HyperAIHyperAI

Command Palette

Search for a command to run...

TalkNCE: Improving Active Speaker Detection with Talk-Aware Contrastive Learning

Chaeyoung Jung extsuperscript1* Suyeon Lee extsuperscript1* Kihyun Nam extsuperscript1 Kyeongha Rho extsuperscript1 You Jin Kim extsuperscript2 Youngjoon Jang extsuperscript1 Joon Son Chung extsuperscript1

Abstract

The goal of this work is Active Speaker Detection (ASD), a task to determinewhether a person is speaking or not in a series of video frames. Previous workshave dealt with the task by exploring network architectures while learningeffective representations has been less explored. In this work, we proposeTalkNCE, a novel talk-aware contrastive loss. The loss is only applied to partof the full segments where a person on the screen is actually speaking. Thisencourages the model to learn effective representations through the naturalcorrespondence of speech and facial movements. Our loss can be jointlyoptimized with the existing objectives for training ASD models without the needfor additional supervision or training data. The experiments demonstrate thatour loss can be easily integrated into the existing ASD frameworks, improvingtheir performance. Our method achieves state-of-the-art performances onAVA-ActiveSpeaker and ASW datasets.


Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

HyperAI Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp