HyperAI

Common Sense Reasoning On Rucos

Metrics

Average F1
EM

Results

Performance results of various models on this benchmark

Model Name
Average F1
EM
Paper TitleRepository
ruBert-base finetune0.740.716--
RuGPT3Large0.210.202--
Multilingual Bert0.290.29--
ruBert-large finetune0.680.658--
Golden Transformer0.920.924--
ruRoberta-large finetune0.730.716--
YaLM 1.0B few-shot0.860.859--
RuGPT3Small0.210.204--
RuGPT3XL few-shot0.670.665--
MT5 Large0.570.562mT5: A massively multilingual pre-trained text-to-text transformer
Human Benchmark0.930.89RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark
ruT5-base-finetune0.790.752--
majority_class0.250.247Unreasonable Effectiveness of Rule-Based Heuristics in Solving Russian SuperGLUE Tasks-
RuGPT3Medium0.230.224--
Baseline TF-IDF1.10.260.252RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark
heuristic majority0.260.257Unreasonable Effectiveness of Rule-Based Heuristics in Solving Russian SuperGLUE Tasks-
RuBERT plain0.320.314--
SBERT_Large_mt_ru_finetuning0.350.347--
SBERT_Large0.360.351--
ruT5-large-finetune0.810.764--
0 of 22 row(s) selected.