HyperAI

Text To Video Generation On Msr Vtt

المقاييس

CLIPSIM
FID
FVD

النتائج

نتائج أداء النماذج المختلفة على هذا المعيار القياسي

اسم النموذج
CLIPSIM
FID
FVD
Paper TitleRepository
ModelScopeT2V0.293011.09550ModelScope Text-to-Video Technical Report
Video LDM0.2929--Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models
TF-T2V0.29918.19441A Recipe for Scaling up Text-to-Video Generation with Text-free Videos
NUWA0.243947.68-NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
PixelDance0.3125-381Make Pixels Dance: High-Dynamic Video Generation-
Make-A-Video0.304913.17-Make-A-Video: Text-to-Video Generation without Text-Video Data
GODIVA0.2402--GODIVA: Generating Open-DomaIn Videos from nAtural Descriptions
MMVG0.264423.4-Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation
CogVideo (English)0.263123.59-Make-A-Video: Text-to-Video Generation without Text-Video Data
Snap Video (512x288)0.2793-104.0Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis-
VideoPoet0.3123-213VideoPoet: A Large Language Model for Zero-Shot Video Generation
MagicVideo-36.5998MagicVideo: Efficient Video Generation With Latent Diffusion Models-
HiGen0.29478.60406Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation
Video-LaVIT0.301211.27188.36Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization
CogVideo (Chinese)0.2614--Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models
VideoComposer0.2932-580VideoComposer: Compositional Video Synthesis with Motion Controllability
Show-10.307213.08538Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
Snap Video (288×288)0.2793-110.4Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis-
0 of 18 row(s) selected.