8 months ago

Audio and Speech Processing

Multi-Task Learning

Method/Architecture

Yan Ru Pei Ritik Shrivastava FNU Sidharth

Abstract

We present aTENNuate, a simple deep state-space autoencoder configured forefficient online raw speech enhancement in an end-to-end fashion. The network'sperformance is primarily evaluated on raw speech denoising, with additionalassessments on tasks such as super-resolution and de-quantization. We benchmarkaTENNuate on the VoiceBank + DEMAND and the Microsoft DNS1 synthetic test sets.The network outperforms previous real-time denoising models in terms of PESQscore, parameter count, MACs, and latency. Even as a raw waveform processingmodel, the model maintains high fidelity to the clean signal with minimalaudible artifacts. In addition, the model remains performant even when thenoisy input is compressed down to 4000Hz and 4 bits, suggesting general speechenhancement capabilities in low-resource environments. Try it out by pipinstall attenuate

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

8 months ago

Audio and Speech Processing

Multi-Task Learning

Method/Architecture

Yan Ru Pei Ritik Shrivastava FNU Sidharth

Abstract

We present aTENNuate, a simple deep state-space autoencoder configured forefficient online raw speech enhancement in an end-to-end fashion. The network'sperformance is primarily evaluated on raw speech denoising, with additionalassessments on tasks such as super-resolution and de-quantization. We benchmarkaTENNuate on the VoiceBank + DEMAND and the Microsoft DNS1 synthetic test sets.The network outperforms previous real-time denoising models in terms of PESQscore, parameter count, MACs, and latency. Even as a raw waveform processingmodel, the model maintains high fidelity to the clean signal with minimalaudible artifacts. In addition, the model remains performant even when thenoisy input is compressed down to 4000Hz and 4 bits, suggesting general speechenhancement capabilities in low-resource environments. Try it out by pipinstall attenuate

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp