Zero Shot Transfer Image Classification On 5
评估指标
Accuracy (Private)
评测结果
各个模型在此基准测试上的表现结果
模型名称 | Accuracy (Private) | Paper Title | Repository |
---|---|---|---|
AltCLIP | 69.5 | AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities | |
LiT-tuning | 79.4 | LiT: Zero-Shot Transfer with Locked-image text Tuning | |
EVA-CLIP-E/14+ | 82.1 | EVA-CLIP: Improved Training Techniques for CLIP at Scale | |
EVA-CLIP-18B | 87.3 | EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters | - |
LiT-22B | 90.1 | Scaling Vision Transformers to 22 Billion Parameters | |
LiT ViT-e | 88.0 | PaLI: A Jointly-Scaled Multilingual Language-Image Model | |
CLIP | 77.2 | Learning Transferable Visual Models From Natural Language Supervision | |
PaLI | 44.7 | PaLI: A Jointly-Scaled Multilingual Language-Image Model | |
BASIC | 85.6 | Combined Scaling for Zero-shot Transfer Learning | - |
ALIGN | 75.8 | Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision | |
BASIC (Lion) | 86.4 | - | - |
InternVL-C | 83.8 | InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks | |
CoCa | 90.2 | CoCa: Contrastive Captioners are Image-Text Foundation Models |
0 of 13 row(s) selected.