Zero Shot Transfer Image Classification On 4
Metriken
Accuracy
Ergebnisse
Leistungsergebnisse verschiedener Modelle zu diesem Benchmark
Modellname | Accuracy | Paper Title | Repository |
---|---|---|---|
CoCa | 96.5 | CoCa: Contrastive Captioners are Image-Text Foundation Models | |
CLIP | 88.9 | Learning Transferable Visual Models From Natural Language Supervision | |
ALIGN | 92.2 | Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision | |
BASIC (Lion) | 96.8 | - | - |
PaLI | 81.97 | PaLI: A Jointly-Scaled Multilingual Language-Image Model | |
EVA-CLIP-E/14+ | 94.5 | EVA-CLIP: Improved Training Techniques for CLIP at Scale | |
LiT ViT-e | 96.1 | PaLI: A Jointly-Scaled Multilingual Language-Image Model | |
BASIC | 95.7 | Combined Scaling for Zero-shot Transfer Learning | - |
LiT-22B | 96.0 | Scaling Vision Transformers to 22 Billion Parameters | |
AltCLIP | 87.2 | AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities | |
EVA-CLIP-18B | 95.7 | EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters | - |
LiT-tuning | 93.9 | LiT: Zero-Shot Transfer with Locked-image text Tuning |
0 of 12 row(s) selected.