Natural Language Inference On Rcb

Accuracy

Average F1

Results

Performance results of various models on this benchmark

Model Name	Accuracy	Average F1	Paper Title	Repository
RuGPT3XL few-shot	0.418	0.302	-	-
ruRoberta-large finetune	0.518	0.357	-	-
Golden Transformer	0.546	0.406	-	-
RuBERT plain	0.463	0.367	-	-
ruT5-large-finetune	0.498	0.306	-	-
Human Benchmark	0.702	0.68	RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark
ruBert-base finetune	0.509	0.333	-	-
RuGPT3Large	0.484	0.417	-	-
RuGPT3Small	0.473	0.356	-	-
YaLM 1.0B few-shot	0.447	0.408	-	-
SBERT_Large	0.452	0.371	-	-
Multilingual Bert	0.445	0.367	-	-
MT5 Large	0.454	0.366	mT5: A massively multilingual pre-trained text-to-text transformer
ruBert-large finetune	0.5	0.356	-	-
SBERT_Large_mt_ru_finetuning	0.486	0.351	-	-
ruT5-base-finetune	0.468	0.307	-	-
heuristic majority	0.438	0.4	Unreasonable Effectiveness of Rule-Based Heuristics in Solving Russian SuperGLUE Tasks	-
Random weighted	0.374	0.319	Unreasonable Effectiveness of Rule-Based Heuristics in Solving Russian SuperGLUE Tasks	-
RuGPT3Medium	0.461	0.372	-	-
RuBERT conversational	0.484	0.452	-	-

0 of 22 row(s) selected.