Image Retrieval On Photochat
Métriques
R1
R@10
R@5
Sum(R@1,5,10)
Résultats
Résultats de performance de divers modèles sur ce benchmark
Nom du modèle | R1 | R@10 | R@5 | Sum(R@1,5,10) | Paper Title | Repository |
---|---|---|---|---|---|---|
VLMo | 11.5 | 39.4 | 30.0 | 83.2 | VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts | |
ViLT | 11.5 | 25.6 | 33.8 | 71.0 | ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision | |
PaCE | 15.2 | 49.6 | 36.7 | 101.5 | PaCE: Unified Multi-modal Dialogue Pre-training with Progressive and Compositional Experts | |
SCAN | 10.4 | 37.1 | 27.0 | 74.5 | Stacked Cross Attention for Image-Text Matching | |
DE++ | 9.0 | 35.7 | 26.4 | 71.1 | PhotoChat: A Human-Human Dialogue Dataset with Photo Sharing Behavior for Joint Image-Text Modeling | - |
0 of 5 row(s) selected.