HyperAI
HyperAI
الرئيسية
الأخبار
الأوراق البحثية
الدروس
مجموعات البيانات
الموسوعة
SOTA
نماذج LLM
لوحة الأداء GPU
الفعاليات
البحث
حول
العربية
HyperAI
HyperAI
Toggle sidebar
البحث في الموقع...
⌘
K
البحث في الموقع...
⌘
K
الرئيسية
SOTA
توليد الصوت
Audio Generation On Audiocaps
Audio Generation On Audiocaps
المقاييس
FAD
FD
النتائج
نتائج أداء النماذج المختلفة على هذا المعيار القياسي
Columns
اسم النموذج
FAD
FD
Paper Title
Repository
Make-An-Audio 2
1.80
11.75
Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation
Stable Audio
-
-
Fast Timing-Conditioned Latent Audio Diffusion
Audiobox Sound
0.77
8.30
Audiobox: Unified Audio Generation with Natural Language Prompts
-
GenAu-Large
1.21
16.51
Taming Data and Transformers for Audio Generation
Tango-AF&AC-FT-AC
2.54
17.19
Improving Text-To-Audio Models with Synthetic Captions
AudioLDM 2-AC-Large
1.42
-
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Re-AudioLDM-L
1.37
-
Retrieval-Augmented Text-to-Audio Generation
-
Auffusion-Full
1.76
23.08
Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation
ETTA
2.51
13.12
ETTA: Elucidating the Design Space of Text-to-Audio Models
AudioGen
3.13
-
AudioGen: Textually Guided Audio Generation
Make-An-Audio
2.66
18.32
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models
TangoFlux
-
-
TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization
ETTA-FT-AC-100k
2.03
10.10
ETTA: Elucidating the Design Space of Text-to-Audio Models
Diffsound
7.75
47.68
Diffsound: Discrete Diffusion Model for Text-to-sound Generation
AudioLDM2-large
2.02
26.18
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Stable Audio 2.0
-
-
Long-form music generation with latent diffusion
Consistency TTA (Single-step generation)
2.18
20.44
ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation
Auffusion
1.63
21.99
Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation
CoDi
1.80
22.90
Any-to-Any Generation via Composable Diffusion
AudioLDM-L-Full
1.96
23.31
AudioLDM: Text-to-Audio Generation with Latent Diffusion Models
0 of 23 row(s) selected.
Previous
Next