Audio Generation On Audiocaps
Metrics
FAD
FD
Results
Performance results of various models on this benchmark
Comparison Table
Model Name | FAD | FD |
---|---|---|
make-an-audio-2-temporal-enhanced-text-to | 1.80 | 11.75 |
fast-timing-conditioned-latent-audio | - | - |
audiobox-unified-audio-generation-with | 0.77 | 8.30 |
taming-data-and-transformers-for-audio-1 | 1.21 | 16.51 |
2406-15487 | 2.54 | 17.19 |
audioldm-2-learning-holistic-audio-generation | 1.42 | - |
retrieval-augmented-text-to-audio-generation | 1.37 | - |
auffusion-leveraging-the-power-of-diffusion | 1.76 | 23.08 |
etta-elucidating-the-design-space-of-text-to | 2.51 | 13.12 |
audiogen-textually-guided-audio-generation | 3.13 | - |
make-an-audio-text-to-audio-generation-with | 2.66 | 18.32 |
tangoflux-super-fast-and-faithful-text-to | - | - |
etta-elucidating-the-design-space-of-text-to | 2.03 | 10.10 |
diffsound-discrete-diffusion-model-for-text | 7.75 | 47.68 |
audioldm-2-learning-holistic-audio-generation | 2.02 | 26.18 |
long-form-music-generation-with-latent | - | - |
accelerating-diffusion-based-text-to-audio | 2.18 | 20.44 |
auffusion-leveraging-the-power-of-diffusion | 1.63 | 21.99 |
any-to-any-generation-via-composable | 1.80 | 22.90 |
audioldm-text-to-audio-generation-with-latent | 1.96 | 23.31 |
text-to-audio-generation-using-instruction | 1.59 | 24.52 |
stable-audio-open | - | - |
tangoflux-super-fast-and-faithful-text-to | - | - |