Zero Shot Transfer Image Classification On 5
評価指標
Accuracy (Private)
評価結果
このベンチマークにおける各モデルのパフォーマンス結果
モデル名 | Accuracy (Private) | Paper Title | Repository |
---|---|---|---|
AltCLIP | 69.5 | AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities | - |
LiT-tuning | 79.4 | LiT: Zero-Shot Transfer with Locked-image text Tuning | - |
EVA-CLIP-E/14+ | 82.1 | EVA-CLIP: Improved Training Techniques for CLIP at Scale | - |
EVA-CLIP-18B | 87.3 | EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters | - |
LiT-22B | 90.1 | Scaling Vision Transformers to 22 Billion Parameters | - |
LiT ViT-e | 88.0 | PaLI: A Jointly-Scaled Multilingual Language-Image Model | - |
CLIP | 77.2 | Learning Transferable Visual Models From Natural Language Supervision | - |
PaLI | 44.7 | PaLI: A Jointly-Scaled Multilingual Language-Image Model | - |
BASIC | 85.6 | Combined Scaling for Zero-shot Transfer Learning | - |
ALIGN | 75.8 | Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision | - |
BASIC (Lion) | 86.4 | - | - |
InternVL-C | 83.8 | InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks | - |
CoCa | 90.2 | CoCa: Contrastive Captioners are Image-Text Foundation Models | - |
0 of 13 row(s) selected.