HyperAI
HyperAI초신경
홈
플랫폼
문서
뉴스
연구 논문
튜토리얼
데이터셋
백과사전
SOTA
LLM 모델
GPU 랭킹
컨퍼런스
전체 검색
소개
서비스 약관
개인정보 처리방침
한국어
HyperAI
HyperAI초신경
Toggle Sidebar
전체 사이트 검색...
⌘
K
Command Palette
Search for a command to run...
플랫폼
홈
SOTA
이미지-텍스트 검색
Image To Text Retrieval On Coco
Image To Text Retrieval On Coco
평가 지표
Recall@1
Recall@10
Recall@5
평가 결과
이 벤치마크에서 각 모델의 성능 결과
Columns
모델 이름
Recall@1
Recall@10
Recall@5
Paper Title
Oscar
-
99.8
-
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
BLIP-2 (ViT-G, fine-tuned)
85.4
98.5
97.0
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
ONE-PEACE (ViT-G, w/o ranking)
84.1
98.3
96.3
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
BLIP-2 (ViT-L, fine-tuned)
83.5
98.0
96.0
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Unicoder-VL
-
97.2
-
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training
IAIS
67.78
94.48
89.7
Learning Relation Alignment for Calibrated Cross-modal Retrieval
CLIP (zero-shot)
58.4
88.1
81.5
Learning Transferable Visual Models From Natural Language Supervision
DVSA
-
74.8
-
Deep Visual-Semantic Alignments for Generating Image Descriptions
SigLIP (ViT-L, zero-shot)
70.6
-
-
Sigmoid Loss for Language Image Pre-Training
FLAVA (ViT-B, zero-shot)
42.74
-
76.76
FLAVA: A Foundational Language And Vision Alignment Model
0 of 10 row(s) selected.
Previous
Next
Image To Text Retrieval On Coco | SOTA | HyperAI초신경