HyperAI초신경
홈
뉴스
최신 연구 논문
튜토리얼
데이터셋
백과사전
SOTA
LLM 모델
GPU 랭킹
컨퍼런스
전체 검색
소개
한국어
HyperAI초신경
Toggle sidebar
전체 사이트 검색...
⌘
K
홈
SOTA
Text To Video Generation
Text To Video Generation On Msr Vtt
Text To Video Generation On Msr Vtt
평가 지표
CLIPSIM
FID
FVD
평가 결과
이 벤치마크에서 각 모델의 성능 결과
Columns
모델 이름
CLIPSIM
FID
FVD
Paper Title
Repository
ModelScopeT2V
0.2930
11.09
550
ModelScope Text-to-Video Technical Report
Video LDM
0.2929
-
-
Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models
TF-T2V
0.2991
8.19
441
A Recipe for Scaling up Text-to-Video Generation with Text-free Videos
NUWA
0.2439
47.68
-
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
PixelDance
0.3125
-
381
Make Pixels Dance: High-Dynamic Video Generation
-
Make-A-Video
0.3049
13.17
-
Make-A-Video: Text-to-Video Generation without Text-Video Data
GODIVA
0.2402
-
-
GODIVA: Generating Open-DomaIn Videos from nAtural Descriptions
MMVG
0.2644
23.4
-
Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation
CogVideo (English)
0.2631
23.59
-
Make-A-Video: Text-to-Video Generation without Text-Video Data
Snap Video (512x288)
0.2793
-
104.0
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
-
VideoPoet
0.3123
-
213
VideoPoet: A Large Language Model for Zero-Shot Video Generation
MagicVideo
-
36.5
998
MagicVideo: Efficient Video Generation With Latent Diffusion Models
-
HiGen
0.2947
8.60
406
Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation
Video-LaVIT
0.3012
11.27
188.36
Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization
CogVideo (Chinese)
0.2614
-
-
Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models
VideoComposer
0.2932
-
580
VideoComposer: Compositional Video Synthesis with Motion Controllability
Show-1
0.3072
13.08
538
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
Snap Video (288×288)
0.2793
-
110.4
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
-
0 of 18 row(s) selected.
Previous
Next