Visual Entailment On Snli Ve Test
평가 지표
Accuracy
평가 결과
이 벤치마크에서 각 모델의 성능 결과
모델 이름 | Accuracy | Paper Title | Repository |
---|---|---|---|
UNITER (Large) | 78.98 | UNITER: UNiversal Image-TExt Representation Learning | |
EVE-ROI* | 70.47 | Visual Entailment: A Novel Task for Fine-Grained Image Understanding | |
MAD (Single Model, Formerly CLIP-TD) | 80.32 | Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks | - |
SimVLM | 86.32 | SimVLM: Simple Visual Language Model Pretraining with Weak Supervision | |
SOHO | 84.95 | Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning | |
Prompt Tuning | 90.12 | Prompt Tuning for Generative Multimodal Pretrained Models | |
CoCa | 87.1 | CoCa: Contrastive Captioners are Image-Text Foundation Models | |
OFA | 91.2 | OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework |
0 of 8 row(s) selected.