Speech Synthesis On Libritts
Metrics
M-STFT
MCD
PESQ
Periodicity
V/UV F1
Results
Performance results of various models on this benchmark
Comparison Table
Model Name | M-STFT | MCD | PESQ | Periodicity | V/UV F1 |
---|---|---|---|---|---|
bigvsan-enhancing-gan-based-neural-vocoders-1 | 0.7881 | 0.3381 | 4.116 | 0.0935 | 0.9635 |
bigvgan-a-universal-neural-vocoder-with-large | 0.7026 | 0.2903 | 4.362 | 0.0593 | 0.9793 |
periodwave-multi-period-flow-matching-for | 1.0269 | - | 4.248 | 0.0765 | 0.9651 |
rfwave-multi-band-rectified-flow-for-audio | - | - | 4.228 | 0.090 | 0.968 |
waveglow-a-flow-based-generative-network-for | 1.3099 | 2.3591 | 3.138 | 0.1485 | 0.9378 |
vocos-closing-the-gap-between-time-domain-and | - | - | 3.70 | 0.101 | 0.9582 |
waveflow-a-compact-flow-based-model-for-raw-1 | 1.1120 | 1.2455 | 3.027 | 0.1416 | 0.9410 |
bigvsan-enhancing-gan-based-neural-vocoders-1 | 0.7992 | 0.4129 | 4.120 | 0.0924 | 0.9644 |
bigvgan-a-universal-neural-vocoder-with-large | 0.7997 | 0.3745 | 4.027 | 0.1018 | 0.9598 |
speaker-conditional-wavernn-towards-universal | 2.2358 | 1.8854 | 1.701 | 0.3044 | 0.8144 |
accelerating-high-fidelity-waveform | 0.7358 | - | 4.454 | 0.0528 | 0.9756 |
eva-gan-enhanced-various-audio-generation-via | 0.7982 | - | 4.3536 | 0.0751 | 0.9745 |
bigvgan-a-universal-neural-vocoder-with-large | 0.8788 | 0.4564 | 3.519 | 0.1287 | 0.9459 |
hifi-gan-generative-adversarial-networks-for | 1.0017 | 0.6603 | 2.947 | 0.1565 | 0.9300 |
eva-gan-enhanced-various-audio-generation-via | 0.9485 | - | 4.0330 | 0.0942 | 0.9658 |