Image Sentence Alignment On Valse Counting 2

Metrics

Accuracy (%)

pairwise accuracy

Results

Performance results of various models on this benchmark

			Paper Title
ViLBERT 12-in-1	66.7	77.3	VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena
ViLBERT	51.8	73.7	VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena
GPT1	-	69.5	VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena
CLIP	-	57.5	VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena
VisualBERT	50.0	50.0	VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena
GPT2	-	45.3	VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena
LXMERT	49.9	42.6	VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena

0 of 7 row(s) selected.

Image Sentence Alignment On Valse Counting 2 | SOTA | HyperAI