Accelerating High-Fidelity Waveform Generation via Adversarial Flow Matching Optimization

This paper introduces PeriodWave-Turbo, a high-fidelity and high-efficientwaveform generation model via adversarial flow matching optimization. Recently,conditional flow matching (CFM) generative models have been successfullyadopted for waveform generation tasks, leveraging a single vector fieldestimation objective for training. Although these models can generatehigh-fidelity waveform signals, they require significantly more ODE stepscompared to GAN-based models, which only need a single generation step.Additionally, the generated samples often lack high-frequency information dueto noisy vector field estimation, which fails to ensure high-frequencyreproduction. To address this limitation, we enhance pre-trained CFM-basedgenerative models by incorporating a fixed-step generator modification. Weutilized reconstruction losses and adversarial feedback to acceleratehigh-fidelity waveform generation. Through adversarial flow matchingoptimization, it only requires 1,000 steps of fine-tuning to achievestate-of-the-art performance across various objective metrics. Moreover, wesignificantly reduce inference speed from 16 steps to 2 or 4 steps.Additionally, by scaling up the backbone of PeriodWave from 29M to 70Mparameters for improved generalization, PeriodWave-Turbo achieves unprecedentedperformance, with a perceptual evaluation of speech quality (PESQ) score of4.454 on the LibriTTS dataset. Audio samples, source code and checkpoints willbe available at https://github.com/sh-lee-prml/PeriodWave.