Video To Sound Generation On Vgg Sound
المقاييس
FAD
FD
النتائج
نتائج أداء النماذج المختلفة على هذا المعيار القياسي
اسم النموذج | FAD | FD | Paper Title | Repository |
---|---|---|---|---|
ReWas | 2.16 | 15.24 | Read, Watch and Scream! Sound Generation from Text and Video | - |
Frieren | 1.32 | 12.26 | Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching | |
MMAudio-S-16kHz | 0.79 | 5.22 | Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis | |
MaskVAT_Hybrid | 2.04 | - | Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity | - |
MMAudio-L-44.1kHz | 0.97 | 4.72 | Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis | |
V-AURA | 1.92 | - | Temporally Aligned Audio for Video with Autoregression | |
V2A-Mapper | 0.841 | 24.168 | V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models | |
VATT-LLama | 2.38 | - | Tell What You Hear From What You See -- Video to Audio Generation Through Text | - |
0 of 8 row(s) selected.