HyperAIHyperAI

Command Palette

Search for a command to run...

CMGAN: Conformer-Based Metric-GAN for Monaural Speech Enhancement

Sherif Abdulatif Ruizhe Cao Bin Yang

Abstract

In this work, we further develop the conformer-based metric generativeadversarial network (CMGAN) model for speech enhancement (SE) in thetime-frequency (TF) domain. This paper builds on our previous work but takes amore in-depth look by conducting extensive ablation studies on model inputs andarchitectural design choices. We rigorously tested the generalization abilityof the model to unseen noise types and distortions. We have fortified ourclaims through DNS-MOS measurements and listening tests. Rather than focusingexclusively on the speech denoising task, we extend this work to address thedereverberation and super-resolution tasks. This necessitated exploring variousarchitectural changes, specifically metric discriminator scores and maskingtechniques. It is essential to highlight that this is among the earliest worksthat attempted complex TF-domain super-resolution. Our findings show that CMGANoutperforms existing state-of-the-art methods in the three major speechenhancement tasks: denoising, dereverberation, and super-resolution. Forexample, in the denoising task using the Voice Bank+DEMAND dataset, CMGANnotably exceeded the performance of prior models, attaining a PESQ score of3.41 and an SSNR of 11.10 dB. Audio samples and CMGAN implementations areavailable online.


Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

HyperAI Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp