HyperAI초신경

Image To Text Retrieval On Coco

평가 지표

Recall@1

Recall@10

Recall@5

평가 결과

이 벤치마크에서 각 모델의 성능 결과

				Paper Title
Oscar	-	99.8	-	Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
BLIP-2 (ViT-G, fine-tuned)	85.4	98.5	97.0	BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
ONE-PEACE (ViT-G, w/o ranking)	84.1	98.3	96.3	ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
BLIP-2 (ViT-L, fine-tuned)	83.5	98.0	96.0	BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Unicoder-VL	-	97.2	-	Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training
IAIS	67.78	94.48	89.7	Learning Relation Alignment for Calibrated Cross-modal Retrieval
CLIP (zero-shot)	58.4	88.1	81.5	Learning Transferable Visual Models From Natural Language Supervision
DVSA	-	74.8	-	Deep Visual-Semantic Alignments for Generating Image Descriptions
SigLIP (ViT-L, zero-shot)	70.6	-	-	Sigmoid Loss for Language Image Pre-Training
FLAVA (ViT-B, zero-shot)	42.74	-	76.76	FLAVA: A Foundational Language And Vision Alignment Model

0 of 10 row(s) selected.

Image To Text Retrieval On Coco | SOTA | HyperAI초신경