Reading Comprehension On Muserc
Metrics
Average F1
EM
Results
Performance results of various models on this benchmark
Model Name | Average F1 | EM | Paper Title | Repository |
---|---|---|---|---|
Baseline TF-IDF1.1 | 0.587 | 0.242 | RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark | |
Human Benchmark | 0.806 | 0.42 | RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark | |
RuBERT conversational | 0.687 | 0.278 | - | - |
ruBert-large finetune | 0.76 | 0.427 | - | - |
Random weighted | 0.45 | 0.071 | Unreasonable Effectiveness of Rule-Based Heuristics in Solving Russian SuperGLUE Tasks | - |
ruT5-base-finetune | 0.769 | 0.446 | - | - |
SBERT_Large | 0.646 | 0.327 | - | - |
MT5 Large | 0.844 | 0.543 | mT5: A massively multilingual pre-trained text-to-text transformer | |
ruRoberta-large finetune | 0.83 | 0.561 | - | - |
ruBert-base finetune | 0.742 | 0.399 | - | - |
RuGPT3Large | 0.729 | 0.333 | - | - |
YaLM 1.0B few-shot | 0.673 | 0.364 | - | - |
RuGPT3Medium | 0.706 | 0.308 | - | - |
SBERT_Large_mt_ru_finetuning | 0.642 | 0.319 | - | - |
Golden Transformer | 0.941 | 0.819 | - | - |
Multilingual Bert | 0.639 | 0.239 | - | - |
ruT5-large-finetune | 0.815 | 0.537 | - | - |
RuGPT3Small | 0.653 | 0.221 | - | - |
heuristic majority | 0.671 | 0.237 | Unreasonable Effectiveness of Rule-Based Heuristics in Solving Russian SuperGLUE Tasks | - |
RuGPT3XL few-shot | 0.74 | 0.546 | - | - |
0 of 22 row(s) selected.