Text To Video Generation On Msr Vtt
Metriken
CLIPSIM
FID
FVD
Ergebnisse
Leistungsergebnisse verschiedener Modelle zu diesem Benchmark
Vergleichstabelle
Modellname | CLIPSIM | FID | FVD |
---|---|---|---|
modelscope-text-to-video-technical-report | 0.2930 | 11.09 | 550 |
align-your-latents-high-resolution-video | 0.2929 | - | - |
a-recipe-for-scaling-up-text-to-video | 0.2991 | 8.19 | 441 |
nuwa-visual-synthesis-pre-training-for-neural | 0.2439 | 47.68 | - |
make-pixels-dance-high-dynamic-video | 0.3125 | - | 381 |
make-a-video-text-to-video-generation-without | 0.3049 | 13.17 | - |
godiva-generating-open-domain-videos-from | 0.2402 | - | - |
tell-me-what-happened-unifying-text-guided | 0.2644 | 23.4 | - |
make-a-video-text-to-video-generation-without | 0.2631 | 23.59 | - |
snap-video-scaled-spatiotemporal-transformers | 0.2793 | - | 104.0 |
videopoet-a-large-language-model-for-zero | 0.3123 | - | 213 |
magicvideo-efficient-video-generation-with | - | 36.5 | 998 |
hierarchical-spatio-temporal-decoupling-for | 0.2947 | 8.60 | 406 |
video-lavit-unified-video-language-pre | 0.3012 | 11.27 | 188.36 |
align-your-latents-high-resolution-video | 0.2614 | - | - |
videocomposer-compositional-video-synthesis | 0.2932 | - | 580 |
show-1-marrying-pixel-and-latent-diffusion | 0.3072 | 13.08 | 538 |
snap-video-scaled-spatiotemporal-transformers | 0.2793 | - | 110.4 |