HyperAIHyperAI

Command Palette

Search for a command to run...

Neural Vocoder is All You Need for Speech Super-resolution

Haohe Liu Woosung Choi Xubo Liu Qiuqiang Kong Qiao Tian DeLiang Wang

Abstract

Speech super-resolution (SR) is a task to increase speech sampling rate bygenerating high-frequency components. Existing speech SR methods are trained inconstrained experimental settings, such as a fixed upsampling ratio. Thesestrong constraints can potentially lead to poor generalization ability inmismatched real-world cases. In this paper, we propose a neural vocoder basedspeech super-resolution method (NVSR) that can handle a variety of inputresolution and upsampling ratios. NVSR consists of a mel-bandwidth extensionmodule, a neural vocoder module, and a post-processing module. Our proposedsystem achieves state-of-the-art results on the VCTK multi-speaker benchmark.On 44.1 kHz target resolution, NVSR outperforms WSRGlow and Nu-wave by 8% and37% respectively on log spectral distance and achieves a significantly betterperceptual quality. We also demonstrate that prior knowledge in the pre-trainedvocoder is crucial for speech SR by performing mel-bandwidth extension with asimple replication-padding method. Samples can be found inhttps://haoheliu.github.io/nvsr.


Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

HyperAI Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp