HyperAI초신경
홈
뉴스
최신 연구 논문
튜토리얼
데이터셋
백과사전
SOTA
LLM 모델
GPU 랭킹
컨퍼런스
전체 검색
소개
한국어
HyperAI초신경
Toggle sidebar
전체 사이트 검색...
⌘
K
홈
SOTA
Text To Image Generation
Text To Image Generation On Coco
Text To Image Generation On Coco
평가 지표
FID
Inception score
평가 결과
이 벤치마크에서 각 모델의 성능 결과
Columns
모델 이름
FID
Inception score
Paper Title
Repository
AttnGAN+CL
23.93
25.70
Improving Text-to-Image Synthesis Using Contrastive Learning
FuseDream (few-shot, k=5)
21.16
34.26
FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimization
Corgi-Semi
10.6
-
Shifted Diffusion for Text-to-image Generation
-
Vanilla CM3
29.5
-
Retrieval-Augmented Multimodal Language Modeling
-
StackGAN-v1
74.05
8.45
StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks
StyleGAN-T (Zero-shot, 256x256)
13.9
-
StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis
Re-Imagen (Finetuned)
5.25
-
Re-Imagen: Retrieval-Augmented Text-to-Image Generator
-
GLIGEN (fine-tuned, Detection data only)
5.82
-
GLIGEN: Open-Set Grounded Text-to-Image Generation
DF-GAN (256 x 256)
-
18.7
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
DALL-E (256 x 256)
27.5
17.9
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
Lafite
8.12
32.34
LAFITE: Towards Language-Free Training for Text-to-Image Generation
Make-a-Scene (unfiltered)
11.84
-
Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors
-
DALL-E 2
10.39
-
Hierarchical Text-Conditional Image Generation with CLIP Latents
-
Imagen (zero-shot)
7.27
-
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
VQ-Diffusion-F
13.86
-
Vector Quantized Diffusion Model for Text-to-Image Synthesis
RAT-Diffusion
5.00
-
Data Extrapolation for Text-to-image Generation on Small Datasets
-
Lafite (zero-shot)
26.94
26.02
LAFITE: Towards Language-Free Training for Text-to-Image Generation
eDiff-I (zero-shot)
6.95
-
eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers
L-Verse
45.8
-
L-Verse: Bidirectional Generation Between Image and Text
GLIGEN (fine-tuned, Grounding data)
6.38
-
GLIGEN: Open-Set Grounded Text-to-Image Generation
0 of 69 row(s) selected.
Previous
Next