Common Sense Reasoning On Rucos

Average F1

Results

Performance results of various models on this benchmark

Model Name	Average F1	EM	Paper Title	Repository
ruBert-base finetune	0.74	0.716	-	-
RuGPT3Large	0.21	0.202	-	-
Multilingual Bert	0.29	0.29	-	-
ruBert-large finetune	0.68	0.658	-	-
Golden Transformer	0.92	0.924	-	-
ruRoberta-large finetune	0.73	0.716	-	-
YaLM 1.0B few-shot	0.86	0.859	-	-
RuGPT3Small	0.21	0.204	-	-
RuGPT3XL few-shot	0.67	0.665	-	-
MT5 Large	0.57	0.562	mT5: A massively multilingual pre-trained text-to-text transformer
Human Benchmark	0.93	0.89	RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark
ruT5-base-finetune	0.79	0.752	-	-
majority_class	0.25	0.247	Unreasonable Effectiveness of Rule-Based Heuristics in Solving Russian SuperGLUE Tasks	-
RuGPT3Medium	0.23	0.224	-	-
Baseline TF-IDF1.1	0.26	0.252	RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark
heuristic majority	0.26	0.257	Unreasonable Effectiveness of Rule-Based Heuristics in Solving Russian SuperGLUE Tasks	-
RuBERT plain	0.32	0.314	-	-
SBERT_Large_mt_ru_finetuning	0.35	0.347	-	-
SBERT_Large	0.36	0.351	-	-
ruT5-large-finetune	0.81	0.764	-	-

0 of 22 row(s) selected.