HyperAI초신경

Audio Generation On Audiocaps

평가 지표

FAD
FD

평가 결과

이 벤치마크에서 각 모델의 성능 결과

모델 이름
FAD
FD
Paper TitleRepository
Make-An-Audio 21.8011.75Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation
Stable Audio--Fast Timing-Conditioned Latent Audio Diffusion
Audiobox Sound0.778.30Audiobox: Unified Audio Generation with Natural Language Prompts-
GenAu-Large1.2116.51Taming Data and Transformers for Audio Generation
Tango-AF&AC-FT-AC2.5417.19Improving Text-To-Audio Models with Synthetic Captions
AudioLDM 2-AC-Large1.42-AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Re-AudioLDM-L1.37-Retrieval-Augmented Text-to-Audio Generation-
Auffusion-Full1.7623.08Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation
ETTA2.5113.12ETTA: Elucidating the Design Space of Text-to-Audio Models-
AudioGen3.13-AudioGen: Textually Guided Audio Generation
Make-An-Audio2.6618.32Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models
TangoFlux--TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization
ETTA-FT-AC-100k2.0310.10ETTA: Elucidating the Design Space of Text-to-Audio Models-
Diffsound7.7547.68Diffsound: Discrete Diffusion Model for Text-to-sound Generation
AudioLDM2-large2.0226.18AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Stable Audio 2.0--Long-form music generation with latent diffusion
Consistency TTA (Single-step generation)2.1820.44ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation-
Auffusion1.6321.99Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation
CoDi1.8022.90Any-to-Any Generation via Composable Diffusion
AudioLDM-L-Full1.9623.31AudioLDM: Text-to-Audio Generation with Latent Diffusion Models
0 of 23 row(s) selected.