HyperAIHyperAI

Video Generation On Ucf 101

المقاييس

FVD16
Inception Score
KVD16

النتائج

نتائج أداء النماذج المختلفة على هذا المعيار القياسي

اسم النموذج
FVD16
Inception Score
KVD16
Paper TitleRepository
DIGAN (128x128, class-conditional)46559.6839.6Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks-
MCVD (64x64)1143--MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation-
MAGVIT (AR)265--MAGVIT: Masked Generative Video Transformer-
PYoCo (Zero-shot, 64x64, text-conditional)355.1947.76-Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models-
LVDM (256x256, unconditional)552-42Latent Video Diffusion Models for High-Fidelity Long Video Generation-
TATS (128x128, class-conditional)33279.28-Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer-
PixelDance (256x256, text-conditional)242.8242.10-Make Pixels Dance: High-Dynamic Video Generation-
ACDiT90--ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer-
Lumiere (Zero-shot. 1024x1024, text-conditional)332.4937.54-Lumiere: A Space-Time Diffusion Model for Video Generation-
MMVG (128x128, class-conditional)32873.7-Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation-
MAGVIT (-L-CG, 128x128, class-conditional)76±289.27±0.15-MAGVIT: Masked Generative Video Transformer-
Make-A-Video (Zero-shot, 256x256, class-conditional)367.2333-Make-A-Video: Text-to-Video Generation without Text-Video Data-
OmniTokenizer-AR191--OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation-
Make-A-Video (Finetuning, 256x256, class-conditional)81.2582.55-Make-A-Video: Text-to-Video Generation without Text-Video Data-
GridDiff (Zero-shot)340.062.88-Grid Diffusion Models for Text-to-Video Generation-
VideoFusion (128x128, unconditional)22072.22-VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation-
VDM1396-116Latent Video Diffusion Models for High-Fidelity Long Video Generation-
VideoAssembler (Zero-shot, 256x256, class-conditional)346.8448.01-MagDiff: Multi-Alignment Diffusion for High-Fidelity Video Generation and Editing-
Video-LaVIT280.5744.26-Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization-
MAGVIT-v258±3--Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation-
0 of 46 row(s) selected.
Video Generation On Ucf 101 | SOTA | HyperAI