HyperAI
HyperAI超神経
ホーム
ニュース
最新論文
チュートリアル
データセット
百科事典
SOTA
LLMモデル
GPU ランキング
学会
検索
サイトについて
日本語
HyperAI
HyperAI超神経
Toggle sidebar
サイトを検索…
⌘
K
ホーム
SOTA
視覚的推論
Visual Reasoning On Nlvr2 Test
Visual Reasoning On Nlvr2 Test
評価指標
Accuracy
評価結果
このベンチマークにおける各モデルのパフォーマンス結果
Columns
モデル名
Accuracy
Paper Title
Repository
CoCa
87.0
CoCa: Contrastive Captioners are Image-Text Foundation Models
-
UNITER (Large)
79.5
UNITER: UNiversal Image-TExt Representation Learning
-
SimVLM
85.15
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
-
VLMo
86.86
VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts
-
BLIP-129M
83.09
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
-
X2-VLM (large)
89.4
X$^2$-VLM: All-In-One Pre-trained Model For Vision-Language Tasks
-
X2-VLM (base)
87.0
X$^2$-VLM: All-In-One Pre-trained Model For Vision-Language Tasks
-
X-VLM (base)
84.76
Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts
-
SOHO
77.32
Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning
-
LXMERT
76.2
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
-
ViLT-B/32
76.13
ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision
-
ALBEF (14M)
82.55
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
-
BEiT-3
92.58
Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks
-
XFM (base)
88.4
Toward Building General Foundation Models for Language, Vision, and Vision-Language Understanding Tasks
-
0 of 14 row(s) selected.
Previous
Next
Visual Reasoning On Nlvr2 Test | SOTA | HyperAI超神経