HyperAI
Startseite
Neuigkeiten
Neueste Forschungsarbeiten
Tutorials
Datensätze
Wiki
SOTA
LLM-Modelle
GPU-Rangliste
Veranstaltungen
Suche
Über
Deutsch
System
HyperAI
Toggle sidebar
Seite durchsuchen…
⌘
K
Startseite
SOTA
Text To Image Generation
Text To Image Generation On Coco
Text To Image Generation On Coco
Metriken
FID
Inception score
Ergebnisse
Leistungsergebnisse verschiedener Modelle zu diesem Benchmark
Columns
Modellname
FID
Inception score
Paper Title
Repository
AttnGAN+CL
23.93
25.70
Improving Text-to-Image Synthesis Using Contrastive Learning
FuseDream (few-shot, k=5)
21.16
34.26
FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimization
Corgi-Semi
10.6
-
Shifted Diffusion for Text-to-image Generation
-
Vanilla CM3
29.5
-
Retrieval-Augmented Multimodal Language Modeling
-
StackGAN-v1
74.05
8.45
StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks
StyleGAN-T (Zero-shot, 256x256)
13.9
-
StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis
Re-Imagen (Finetuned)
5.25
-
Re-Imagen: Retrieval-Augmented Text-to-Image Generator
-
GLIGEN (fine-tuned, Detection data only)
5.82
-
GLIGEN: Open-Set Grounded Text-to-Image Generation
DF-GAN (256 x 256)
-
18.7
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
DALL-E (256 x 256)
27.5
17.9
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
Lafite
8.12
32.34
LAFITE: Towards Language-Free Training for Text-to-Image Generation
Make-a-Scene (unfiltered)
11.84
-
Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors
-
DALL-E 2
10.39
-
Hierarchical Text-Conditional Image Generation with CLIP Latents
-
Imagen (zero-shot)
7.27
-
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
VQ-Diffusion-F
13.86
-
Vector Quantized Diffusion Model for Text-to-Image Synthesis
RAT-Diffusion
5.00
-
Data Extrapolation for Text-to-image Generation on Small Datasets
-
Lafite (zero-shot)
26.94
26.02
LAFITE: Towards Language-Free Training for Text-to-Image Generation
eDiff-I (zero-shot)
6.95
-
eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers
L-Verse
45.8
-
L-Verse: Bidirectional Generation Between Image and Text
GLIGEN (fine-tuned, Grounding data)
6.38
-
GLIGEN: Open-Set Grounded Text-to-Image Generation
0 of 69 row(s) selected.
Previous
Next