HyperAI超神経
ホーム
ニュース
最新論文
チュートリアル
データセット
百科事典
SOTA
LLMモデル
GPU ランキング
学会
検索
サイトについて
日本語
HyperAI超神経
Toggle sidebar
サイトを検索…
⌘
K
ホーム
SOTA
Image Retrieval
Image Retrieval On Crepe Vision Language
Image Retrieval On Crepe Vision Language
評価指標
Recall@1 (HN-Atom + HN-Comp, SC)
Recall@1 (HN-Atom + HN-Comp, UC)
Recall@1 (HN-Atom, UC)
Recall@1 (HN-Comp, UC)
評価結果
このベンチマークにおける各モデルのパフォーマンス結果
Columns
モデル名
Recall@1 (HN-Atom + HN-Comp, SC)
Recall@1 (HN-Atom + HN-Comp, UC)
Recall@1 (HN-Atom, UC)
Recall@1 (HN-Comp, UC)
Paper Title
Repository
ViT-B-16 (LAION400M)
37.01
30.81
44.93
59.00
CREPE: Can Vision-Language Foundation Models Reason Compositionally?
Swin-T (CLIP, CC-12M)
-
-
37.3
44.1
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality
-
RN50 (CC12M)
23.26
19.96
34.88
45.27
CREPE: Can Vision-Language Foundation Models Reason Compositionally?
ViT-L-14 (LAION400M)
39.44
33.81
47.86
60.78
CREPE: Can Vision-Language Foundation Models Reason Compositionally?
RN-50 (CLIP, CC-12M)
-
-
36.7
42.9
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality
-
MosaiCLIP (CC-FT)
-
-
40.9
72.4
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality
-
NegCLIP (YFCC-FT)
-
-
39.0
38.8
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality
-
ViT-B-32 (LAION400M)
34.28
28.00
42.75
54.80
CREPE: Can Vision-Language Foundation Models Reason Compositionally?
CLIP-FT (YFCC-FT)
-
-
38.3
36.4
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality
-
RN101 (YFCC15M)
22.74
20.50
39.50
39.56
CREPE: Can Vision-Language Foundation Models Reason Compositionally?
ViT-B-16+240 (LAION400M)
37.32
32.26
46.53
60.19
CREPE: Can Vision-Language Foundation Models Reason Compositionally?
CLIP-FT (CC-FT)
-
-
35.6
45.8
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality
-
CLIP (YFCC-FT)
-
-
39.5
39.8
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality
-
CLIP (CC-FT)
-
-
35.0
45.1
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality
-
Random
9.09
9.09
20.00
14.29
CREPE: Can Vision-Language Foundation Models Reason Compositionally?
RN-50 (NegCLIP, CC-12M)
-
-
41.4
82.0
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality
-
RN-50 (MosaiCLIP, CC-12M)
-
-
44.4
92.6
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality
-
NegCLIP (CC-FT)
-
-
37.5
53.1
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality
-
Swin-T (MosaiCLIP, CC-12M)
-
-
44.5
92.1
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality
-
Swin-T (NegCLIP, CC-12M)
-
-
39.6
80.3
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality
-
0 of 22 row(s) selected.
Previous
Next