HyperAI초신경

홈 뉴스 연구 논문 튜토리얼 데이터셋 백과사전 SOTA LLM 모델 GPU 랭킹 컨퍼런스

한국어

HyperAI초신경

Video To Sound Generation On Vgg Sound

평가 지표

FAD

FD

평가 결과

이 벤치마크에서 각 모델의 성능 결과

모델 이름	FAD	FD	Paper Title	Repository
ReWas	2.16	15.24	Read, Watch and Scream! Sound Generation from Text and Video
Frieren	1.32	12.26	Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching
MMAudio-S-16kHz	0.79	5.22	Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
MaskVAT_Hybrid	2.04	-	Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity	-
MMAudio-L-44.1kHz	0.97	4.72	Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
V-AURA	1.92	-	Temporally Aligned Audio for Video with Autoregression
V2A-Mapper	0.841	24.168	V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models
VATT-LLama	2.38	-	Tell What You Hear From What You See -- Video to Audio Generation Through Text

0 of 8 row(s) selected.