HyperAI초신경
홈
뉴스
최신 연구 논문
튜토리얼
데이터셋
백과사전
SOTA
LLM 모델
GPU 랭킹
컨퍼런스
전체 검색
소개
한국어
HyperAI초신경
Toggle sidebar
전체 사이트 검색...
⌘
K
홈
SOTA
Audio Generation
Audio Generation On Audiocaps
Audio Generation On Audiocaps
평가 지표
FAD
FD
평가 결과
이 벤치마크에서 각 모델의 성능 결과
Columns
모델 이름
FAD
FD
Paper Title
Repository
Make-An-Audio 2
1.80
11.75
Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation
Stable Audio
-
-
Fast Timing-Conditioned Latent Audio Diffusion
Audiobox Sound
0.77
8.30
Audiobox: Unified Audio Generation with Natural Language Prompts
-
GenAu-Large
1.21
16.51
Taming Data and Transformers for Audio Generation
Tango-AF&AC-FT-AC
2.54
17.19
Improving Text-To-Audio Models with Synthetic Captions
AudioLDM 2-AC-Large
1.42
-
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Re-AudioLDM-L
1.37
-
Retrieval-Augmented Text-to-Audio Generation
-
Auffusion-Full
1.76
23.08
Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation
ETTA
2.51
13.12
ETTA: Elucidating the Design Space of Text-to-Audio Models
-
AudioGen
3.13
-
AudioGen: Textually Guided Audio Generation
Make-An-Audio
2.66
18.32
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models
TangoFlux
-
-
TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization
ETTA-FT-AC-100k
2.03
10.10
ETTA: Elucidating the Design Space of Text-to-Audio Models
-
Diffsound
7.75
47.68
Diffsound: Discrete Diffusion Model for Text-to-sound Generation
AudioLDM2-large
2.02
26.18
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Stable Audio 2.0
-
-
Long-form music generation with latent diffusion
Consistency TTA (Single-step generation)
2.18
20.44
ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation
-
Auffusion
1.63
21.99
Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation
CoDi
1.80
22.90
Any-to-Any Generation via Composable Diffusion
AudioLDM-L-Full
1.96
23.31
AudioLDM: Text-to-Audio Generation with Latent Diffusion Models
0 of 23 row(s) selected.
Previous
Next