PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation

Recently, universal waveform generation tasks have been investigatedconditioned on various out-of-distribution scenarios. Although GAN-basedmethods have shown their strength in fast waveform generation, they arevulnerable to train-inference mismatch scenarios such as two-stagetext-to-speech. Meanwhile, diffusion-based models have shown their powerfulgenerative performance in other domains; however, they stay out of thelimelight due to slow inference speed in waveform generation tasks. Above all,there is no generator architecture that can explicitly disentangle the naturalperiodic features of high-resolution waveform signals. In this paper, wepropose PeriodWave, a novel universal waveform generation model. First, weintroduce a period-aware flow matching estimator that can capture the periodicfeatures of the waveform signal when estimating the vector fields.Additionally, we utilize a multi-period estimator that avoids overlaps tocapture different periodic features of waveform signals. Although increasingthe number of periods can improve the performance significantly, this requiresmore computational costs. To reduce this issue, we also propose a singleperiod-conditional universal estimator that can feed-forward parallel byperiod-wise batch inference. Additionally, we utilize discrete wavelettransform to losslessly disentangle the frequency information of waveformsignals for high-frequency modeling, and introduce FreeU to reduce thehigh-frequency noise for waveform generation. The experimental resultsdemonstrated that our model outperforms the previous models both inMel-spectrogram reconstruction and text-to-speech tasks. All source code willbe available at https://github.com/sh-lee-prml/PeriodWave.