8 months ago

Audio and Speech Processing

Haohe Liu Woosung Choi Xubo Liu Qiuqiang Kong Qiao Tian DeLiang Wang

Abstract

Speech super-resolution (SR) is a task to increase speech sampling rate bygenerating high-frequency components. Existing speech SR methods are trained inconstrained experimental settings, such as a fixed upsampling ratio. Thesestrong constraints can potentially lead to poor generalization ability inmismatched real-world cases. In this paper, we propose a neural vocoder basedspeech super-resolution method (NVSR) that can handle a variety of inputresolution and upsampling ratios. NVSR consists of a mel-bandwidth extensionmodule, a neural vocoder module, and a post-processing module. Our proposedsystem achieves state-of-the-art results on the VCTK multi-speaker benchmark.On 44.1 kHz target resolution, NVSR outperforms WSRGlow and Nu-wave by 8% and37% respectively on log spectral distance and achieves a significantly betterperceptual quality. We also demonstrate that prior knowledge in the pre-trainedvocoder is crucial for speech SR by performing mel-bandwidth extension with asimple replication-padding method. Samples can be found inhttps://haoheliu.github.io/nvsr.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

8 months ago

Audio and Speech Processing

Haohe Liu Woosung Choi Xubo Liu Qiuqiang Kong Qiao Tian DeLiang Wang

Abstract

Speech super-resolution (SR) is a task to increase speech sampling rate bygenerating high-frequency components. Existing speech SR methods are trained inconstrained experimental settings, such as a fixed upsampling ratio. Thesestrong constraints can potentially lead to poor generalization ability inmismatched real-world cases. In this paper, we propose a neural vocoder basedspeech super-resolution method (NVSR) that can handle a variety of inputresolution and upsampling ratios. NVSR consists of a mel-bandwidth extensionmodule, a neural vocoder module, and a post-processing module. Our proposedsystem achieves state-of-the-art results on the VCTK multi-speaker benchmark.On 44.1 kHz target resolution, NVSR outperforms WSRGlow and Nu-wave by 8% and37% respectively on log spectral distance and achieves a significantly betterperceptual quality. We also demonstrate that prior knowledge in the pre-trainedvocoder is crucial for speech SR by performing mel-bandwidth extension with asimple replication-padding method. Samples can be found inhttps://haoheliu.github.io/nvsr.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp