HyperAI

Video Generation On Ucf 101

Metriken

FVD16
Inception Score
KVD16

Ergebnisse

Leistungsergebnisse verschiedener Modelle zu diesem Benchmark

Modellname
FVD16
Inception Score
KVD16
Paper TitleRepository
DIGAN (128x128, class-conditional)46559.6839.6Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks
MCVD (64x64)1143--MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation
MAGVIT (AR)265--MAGVIT: Masked Generative Video Transformer
PYoCo (Zero-shot, 64x64, text-conditional)355.1947.76-Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models-
LVDM (256x256, unconditional)552-42Latent Video Diffusion Models for High-Fidelity Long Video Generation
TATS (128x128, class-conditional)33279.28-Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer
PixelDance (256x256, text-conditional)242.8242.10-Make Pixels Dance: High-Dynamic Video Generation-
ACDiT90--ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer
Lumiere (Zero-shot. 1024x1024, text-conditional)332.4937.54-Lumiere: A Space-Time Diffusion Model for Video Generation
MMVG (128x128, class-conditional)32873.7-Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation
MAGVIT (-L-CG, 128x128, class-conditional)76±289.27±0.15-MAGVIT: Masked Generative Video Transformer
Make-A-Video (Zero-shot, 256x256, class-conditional)367.2333-Make-A-Video: Text-to-Video Generation without Text-Video Data
OmniTokenizer-AR191--OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation
Make-A-Video (Finetuning, 256x256, class-conditional)81.2582.55-Make-A-Video: Text-to-Video Generation without Text-Video Data
GridDiff (Zero-shot)340.062.88-Grid Diffusion Models for Text-to-Video Generation-
VideoFusion (128x128, unconditional)22072.22-VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation
VDM1396-116Latent Video Diffusion Models for High-Fidelity Long Video Generation
VideoAssembler (Zero-shot, 256x256, class-conditional)346.8448.01-MagDiff: Multi-Alignment Diffusion for High-Fidelity Video Generation and Editing
Video-LaVIT280.5744.26-Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization
MAGVIT-v258±3--Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation
0 of 46 row(s) selected.