HyperAIHyperAI
12 days ago

ADA-VAD: Unpaired Adversarial Domain Adaptation for Noise-Robust Voice Activity Detection

{Jong Hwan Ko, Jiho Chang, Taesoo Kim}
ADA-VAD: Unpaired Adversarial Domain Adaptation for Noise-Robust Voice Activity Detection
Abstract

Voice Activity Detection (VAD) is becoming an essential front-end component in various speech processing systems. As those systems are commonly deployed in environments with diverse noise types and low signal-to-noise ratios (SNRs), an effective VAD method should perform robust detection of speech region out of noisy background signals. In this paper, we propose adversarial domain adaptive VAD (ADA-VAD), which is a deep neural network (DNN) based VAD method highly robust to audio samples with various noise types and low SNRs. The proposed method trains DNN models for a VAD task in a supervised manner. Simultaneously, to mitigate the performance degradation due to back-ground noises, the adversarial domain adaptation method is adopted to match the domain discrepancy between noisy and clean audio stream in an unsupervised manner. The results show that ADA-VAD achieves an average of 3.6%p and 7%p higher AUC than models trained with manually extracted features on the AVA-speech dataset and a speech database synthesized with an unseen noise database, respectively.

ADA-VAD: Unpaired Adversarial Domain Adaptation for Noise-Robust Voice Activity Detection | Latest Papers | HyperAI