HyperAI
HyperAI초신경
홈
뉴스
최신 연구 논문
튜토리얼
데이터셋
백과사전
SOTA
LLM 모델
GPU 랭킹
컨퍼런스
전체 검색
소개
한국어
HyperAI
HyperAI초신경
Toggle sidebar
전체 사이트 검색...
⌘
K
홈
SOTA
시각적 추론
Visual Reasoning On Nlvr2 Test
Visual Reasoning On Nlvr2 Test
평가 지표
Accuracy
평가 결과
이 벤치마크에서 각 모델의 성능 결과
Columns
모델 이름
Accuracy
Paper Title
Repository
CoCa
87.0
CoCa: Contrastive Captioners are Image-Text Foundation Models
-
UNITER (Large)
79.5
UNITER: UNiversal Image-TExt Representation Learning
-
SimVLM
85.15
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
-
VLMo
86.86
VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts
-
BLIP-129M
83.09
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
-
X2-VLM (large)
89.4
X$^2$-VLM: All-In-One Pre-trained Model For Vision-Language Tasks
-
X2-VLM (base)
87.0
X$^2$-VLM: All-In-One Pre-trained Model For Vision-Language Tasks
-
X-VLM (base)
84.76
Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts
-
SOHO
77.32
Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning
-
LXMERT
76.2
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
-
ViLT-B/32
76.13
ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision
-
ALBEF (14M)
82.55
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
-
BEiT-3
92.58
Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks
-
XFM (base)
88.4
Toward Building General Foundation Models for Language, Vision, and Vision-Language Understanding Tasks
-
0 of 14 row(s) selected.
Previous
Next
Visual Reasoning On Nlvr2 Test | SOTA | HyperAI초신경