HyperAI
HyperAI초신경
홈
플랫폼
문서
뉴스
연구 논문
튜토리얼
데이터셋
백과사전
SOTA
LLM 모델
GPU 랭킹
컨퍼런스
전체 검색
소개
서비스 약관
개인정보 처리방침
한국어
HyperAI
HyperAI초신경
Toggle Sidebar
전체 사이트 검색...
⌘
K
Command Palette
Search for a command to run...
플랫폼
홈
SOTA
이미지-텍스트 검색
Image To Text Retrieval On Flickr30K
Image To Text Retrieval On Flickr30K
평가 지표
Recall@1
Recall@10
Recall@5
평가 결과
이 벤치마크에서 각 모델의 성능 결과
Columns
모델 이름
Recall@1
Recall@10
Recall@5
Paper Title
InternVL-G-FT (finetuned, w/o ranking)
97.9
100
100
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
BLIP-2 ViT-G (zero-shot, 1K test set)
97.6
100
100
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
ONE-PEACE (finetuned, w/o ranking)
97.6
100
100
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
InternVL-C-FT (finetuned, w/o ranking)
97.2
100
100
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
BLIP-2 ViT-L (zero-shot, 1K test set)
96.9
100
100
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
ERNIE-ViL 2.0
96.1
100.0
99.9
ERNIE-ViL 2.0: Multi-view Contrastive Learning for Image-Text Pre-training
ALBEF
95.9
100.0
99.8
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
ALBEF
92.6
99.9
99.3
HADA: A Graph-based Amalgamation Framework in Image-text Retrieval
UNITER
87.3
99.2
98
HADA: A Graph-based Amalgamation Framework in Image-text Retrieval
GSMN
76.4
97.3
94.3
A Deep Local and Global Scene-Graph Matching for Image-Text Retrieval
LGSGM
71
96.1
91.9
A Deep Local and Global Scene-Graph Matching for Image-Text Retrieval
0 of 11 row(s) selected.
Previous
Next
Image To Text Retrieval On Flickr30K | SOTA | HyperAI초신경