Text To Image Generation On Coco

Métriques

FID

Inception score

Résultats

Résultats de performance de divers modèles sur ce benchmark

Nom du modèle	FID	Inception score	Paper Title	Repository
AttnGAN+CL	23.93	25.70	Improving Text-to-Image Synthesis Using Contrastive Learning
FuseDream (few-shot, k=5)	21.16	34.26	FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimization
Corgi-Semi	10.6	-	Shifted Diffusion for Text-to-image Generation	-
Vanilla CM3	29.5	-	Retrieval-Augmented Multimodal Language Modeling	-
StackGAN-v1	74.05	8.45	StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks
StyleGAN-T (Zero-shot, 256x256)	13.9	-	StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis
Re-Imagen (Finetuned)	5.25	-	Re-Imagen: Retrieval-Augmented Text-to-Image Generator	-
GLIGEN (fine-tuned, Detection data only)	5.82	-	GLIGEN: Open-Set Grounded Text-to-Image Generation
DF-GAN (256 x 256)	-	18.7	NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
DALL-E (256 x 256)	27.5	17.9	NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
Lafite	8.12	32.34	LAFITE: Towards Language-Free Training for Text-to-Image Generation
Make-a-Scene (unfiltered)	11.84	-	Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors	-
DALL-E 2	10.39	-	Hierarchical Text-Conditional Image Generation with CLIP Latents	-
Imagen (zero-shot)	7.27	-	Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
VQ-Diffusion-F	13.86	-	Vector Quantized Diffusion Model for Text-to-Image Synthesis
RAT-Diffusion	5.00	-	Data Extrapolation for Text-to-image Generation on Small Datasets	-
Lafite (zero-shot)	26.94	26.02	LAFITE: Towards Language-Free Training for Text-to-Image Generation
eDiff-I (zero-shot)	6.95	-	eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers
L-Verse	45.8	-	L-Verse: Bidirectional Generation Between Image and Text
GLIGEN (fine-tuned, Grounding data)	6.38	-	GLIGEN: Open-Set Grounded Text-to-Image Generation

0 of 69 row(s) selected.