HyperAI초신경

Text To Video Generation On Msr Vtt

평가 지표

CLIPSIM

FID

FVD

평가 결과

이 벤치마크에서 각 모델의 성능 결과

				Paper Title
PixelDance	0.3125	-	381	Make Pixels Dance: High-Dynamic Video Generation
VideoPoet	0.3123	-	213	VideoPoet: A Large Language Model for Zero-Shot Video Generation
Show-1	0.3072	13.08	538	Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
Make-A-Video	0.3049	13.17	-	Make-A-Video: Text-to-Video Generation without Text-Video Data
Video-LaVIT	0.3012	11.27	188.36	Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization
TF-T2V	0.2991	8.19	441	A Recipe for Scaling up Text-to-Video Generation with Text-free Videos
HiGen	0.2947	8.60	406	Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation
VideoComposer	0.2932	-	580	VideoComposer: Compositional Video Synthesis with Motion Controllability
ModelScopeT2V	0.2930	11.09	550	ModelScope Text-to-Video Technical Report
Video LDM	0.2929	-	-	Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models
Snap Video (512x288)	0.2793	-	104.0	Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
Snap Video (288×288)	0.2793	-	110.4	Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
MMVG	0.2644	23.4	-	Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation
CogVideo (English)	0.2631	23.59	-	Make-A-Video: Text-to-Video Generation without Text-Video Data
CogVideo (Chinese)	0.2614	-	-	Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models
NUWA	0.2439	47.68	-	NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
GODIVA	0.2402	-	-	GODIVA: Generating Open-DomaIn Videos from nAtural Descriptions
MagicVideo	-	36.5	998	MagicVideo: Efficient Video Generation With Latent Diffusion Models

0 of 18 row(s) selected.

Text To Video Generation On Msr Vtt | SOTA | HyperAI초신경