Image Retrieval On Muge Retrieval
評価指標
Mean Recall
R@1
R@10
R@5
評価結果
このベンチマークにおける各モデルのパフォーマンス結果
モデル名 | Mean Recall | R@1 | R@10 | R@5 | Paper Title | Repository |
---|---|---|---|---|---|---|
CN-CLIP (ViT-H/14) | 83.6 | 68.9 | 93.1 | 88.7 | Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese | |
CN-CLIP (RN50) | 69.2 | 48.6 | 84.0 | 75.1 | Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese | |
CN-CLIP (ViT-B/16) | 77.4 | 58.4 | 90.0 | 83.6 | Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese | |
Wukong (ViT-L/14) | 72.1 | 52.7 | 85.6 | 77.9 | Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark | |
CN-CLIP (ViT-L/14) | 80.1 | 63.3 | 91.3 | 85.6 | Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese | |
Wukong (ViT-B/32) | 61.2 | 39.2 | 77.4 | 66.9 | Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark | |
R2D2 (ViT-L/14) | 77.5 | 60.1 | 89.4 | 82.9 | CCMB: A Large-scale Chinese Cross-modal Benchmark | |
R2D2 (ViT-B) | 68.7 | 47.4 | 83.5 | 75.1 | CCMB: A Large-scale Chinese Cross-modal Benchmark | |
CN-CLIP (ViT-L/14@336px) | 81.3 | 65.3 | 92.1 | 86.7 | Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese |
0 of 9 row(s) selected.