Common Sense Reasoning On Rucos
평가 지표
Average F1
EM
평가 결과
이 벤치마크에서 각 모델의 성능 결과
모델 이름 | Average F1 | EM | Paper Title | Repository |
---|---|---|---|---|
ruBert-base finetune | 0.74 | 0.716 | - | - |
RuGPT3Large | 0.21 | 0.202 | - | - |
Multilingual Bert | 0.29 | 0.29 | - | - |
ruBert-large finetune | 0.68 | 0.658 | - | - |
Golden Transformer | 0.92 | 0.924 | - | - |
ruRoberta-large finetune | 0.73 | 0.716 | - | - |
YaLM 1.0B few-shot | 0.86 | 0.859 | - | - |
RuGPT3Small | 0.21 | 0.204 | - | - |
RuGPT3XL few-shot | 0.67 | 0.665 | - | - |
MT5 Large | 0.57 | 0.562 | mT5: A massively multilingual pre-trained text-to-text transformer | |
Human Benchmark | 0.93 | 0.89 | RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark | |
ruT5-base-finetune | 0.79 | 0.752 | - | - |
majority_class | 0.25 | 0.247 | Unreasonable Effectiveness of Rule-Based Heuristics in Solving Russian SuperGLUE Tasks | - |
RuGPT3Medium | 0.23 | 0.224 | - | - |
Baseline TF-IDF1.1 | 0.26 | 0.252 | RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark | |
heuristic majority | 0.26 | 0.257 | Unreasonable Effectiveness of Rule-Based Heuristics in Solving Russian SuperGLUE Tasks | - |
RuBERT plain | 0.32 | 0.314 | - | - |
SBERT_Large_mt_ru_finetuning | 0.35 | 0.347 | - | - |
SBERT_Large | 0.36 | 0.351 | - | - |
ruT5-large-finetune | 0.81 | 0.764 | - | - |
0 of 22 row(s) selected.