Text To Image Generation On Coco
Métriques
FID
Inception score
Résultats
Résultats de performance de divers modèles sur ce benchmark
Tableau comparatif
Nom du modèle | FID | Inception score |
---|---|---|
improving-text-to-image-synthesis-using | 23.93 | 25.70 |
fusedream-training-free-text-to-image | 21.16 | 34.26 |
shifted-diffusion-for-text-to-image | 10.6 | - |
retrieval-augmented-multimodal-language | 29.5 | - |
stackgan-realistic-image-synthesis-with | 74.05 | 8.45 |
stylegan-t-unlocking-the-power-of-gans-for | 13.9 | - |
re-imagen-retrieval-augmented-text-to-image | 5.25 | - |
gligen-open-set-grounded-text-to-image | 5.82 | - |
nuwa-visual-synthesis-pre-training-for-neural | - | 18.7 |
nuwa-visual-synthesis-pre-training-for-neural | 27.5 | 17.9 |
lafite-towards-language-free-training-for | 8.12 | 32.34 |
make-a-scene-scene-based-text-to-image | 11.84 | - |
hierarchical-text-conditional-image | 10.39 | - |
photorealistic-text-to-image-diffusion-models | 7.27 | - |
vector-quantized-diffusion-model-for-text-to | 13.86 | - |
data-extrapolation-for-text-to-image | 5.00 | - |
lafite-towards-language-free-training-for | 26.94 | 26.02 |
ediffi-text-to-image-diffusion-models-with-an | 6.95 | - |
l-verse-bidirectional-generation-between | 45.8 | - |
gligen-open-set-grounded-text-to-image | 6.38 | - |
raphael-text-to-image-generation-via-large | 6.61 | - |
knn-diffusion-image-generation-via-large | 12.5 | - |
cogview2-faster-and-better-text-to-image | 17.7 | - |
cogview-mastering-text-to-image-generation | 27.1 | 18.2 |
victr-visual-information-captured-text | - | 10.38 |
nuwa-visual-synthesis-pre-training-for-neural | 12.9 | 27.2 |
fusedream-training-free-text-to-image | 21.89 | 34.67 |
ernie-vilg-2-0-improving-text-to-image | 6.75 | - |
victr-visual-information-captured-text | 32.37 | 32.37 |
generating-multiple-objects-at-spatially | 55.30 | 12.12 |
all-are-worth-words-a-vit-backbone-for-score | 5.95 | - |
re-imagen-retrieval-augmented-text-to-image | 6.88 | - |
dm-gan-dynamic-memory-generative-adversarial | 32.64 | 30.49 |
vector-quantized-diffusion-model-for-text-to | 19.75 | - |
nuwa-visual-synthesis-pre-training-for-neural | 26.0 | 32.2 |
kandinsky-an-improved-text-to-image-synthesis | 8.03 | - |
ernie-vilg-unified-generative-pre-training | 14.7 | - |
retrieval-augmented-multimodal-language | 15.7 | - |
scaling-up-gans-for-text-to-image-synthesis | 9.09 | - |
retrieval-augmented-multimodal-language | 28 | - |
victr-visual-information-captured-text | 29.26 | 28.18 |
scaling-up-gans-for-text-to-image-synthesis | 7.28 | - |
l-verse-bidirectional-generation-between | 37.2 | - |
nuwa-visual-synthesis-pre-training-for-neural | 35.2 | 23.3 |
tr0n-translator-networks-for-0-shot-plug-and | 10.9 | - |
stylegan-t-unlocking-the-power-of-gans-for | 7.3 | - |
improving-text-to-image-synthesis-using | 20.79 | 33.34 |
retrieval-augmented-multimodal-language | 12.63 | - |
swinv2-imagen-hierarchical-vision-transformer | 7.21 | 31.46 |
fusedream-training-free-text-to-image | 21.16 | 34.26 |
chatpainter-improving-text-to-image | - | 9.74 |
shifted-diffusion-for-text-to-image | 10.88 | - |
nuwa-visual-synthesis-pre-training-for-neural | 27.1 | 18.2 |
improving-diffusion-based-image-synthesis-1 | 6.21 | - |
191013321 | 24.70 | 27.88 |
all-are-worth-words-a-vit-backbone-for-score | 5.48 | - |
long-and-short-guidance-in-score-identity | 8.15 | - |
cross-modal-contrastive-learning-for-text-to | 9.33 | - |
galip-generative-adversarial-clips-for-text | 12.54 | - |
make-a-scene-scene-based-text-to-image | 7.55 | - |
cogview2-faster-and-better-text-to-image | 24 | - |
recurrent-affine-transformation-for-text-to | 14.6 | - |
simple-diffusion-end-to-end-diffusion-for | 8.3 | - |
truncated-diffusion-probabilistic-models | 6.29 | - |
glide-towards-photorealistic-image-generation | 12.24 | - |
generating-multiple-objects-at-spatially | 33.35 | 24.76 |
nuwa-visual-synthesis-pre-training-for-neural | 9.3 | 30.5 |
high-resolution-image-synthesis-with-latent | 12.63 | - |
gligen-open-set-grounded-text-to-image | 5.61 | - |