Reading Comprehension On Race
評価指標
Accuracy
Accuracy (High)
Accuracy (Middle)
評価結果
このベンチマークにおける各モデルのパフォーマンス結果
比較表
モデル名 | Accuracy | Accuracy (High) | Accuracy (Middle) |
---|---|---|---|
funnel-transformer-filtering-out-sequential | 85.7 | 84.4 | 88.8 |
megatron-lm-training-multi-billion-parameter | 89.5 | 88.6 | 91.8 |
language-models-are-few-shot-learners | - | 45.5 | - |
llama-open-and-efficient-foundation-language-1 | - | 48.3 | 64.1 |
llama-open-and-efficient-foundation-language-1 | - | 51.6 | 67.9 |
roberta-a-robustly-optimized-bert-pretraining | 83.2 | 81.3 | 86.5 |
megatron-lm-training-multi-billion-parameter | 90.9 | 90.0 | 93.1 |
deberta-decoding-enhanced-bert-with | 86.8 | - | - |
language-models-are-few-shot-learners | - | - | 58.4 |
improving-machine-reading-comprehension-with-2 | 91.4 | - | - |
bloomberggpt-a-large-language-model-for | - | 39.14 | 52.3 |
bloomberggpt-a-large-language-model-for | - | 34.33 | 41.23 |
bloomberggpt-a-large-language-model-for | - | 37.02 | 47.42 |
orca-2-teaching-small-language-models-how-to | 80.79 | - | - |
palm-scaling-language-modeling-with-pathways-1 | - | 42.3 | 57.9 |
xlnet-generalized-autoregressive-pretraining | - | 84.0 | 88.6 |
bloomberggpt-a-large-language-model-for | - | 41.74 | 54.32 |
palm-scaling-language-modeling-with-pathways-1 | - | 49.1 | 68.1 |
llama-open-and-efficient-foundation-language-1 | - | 46.9 | 61.1 |
orca-2-teaching-small-language-models-how-to | 82.87 | - | - |
llama-open-and-efficient-foundation-language-1 | - | 47.2 | 61.6 |
hierarchical-learning-for-generation-with | 67.3 | - | - |
palm-scaling-language-modeling-with-pathways-1 | - | 47.5 | 64.3 |
dual-multi-head-co-attention-for-multi-choice | 89.8 | 92.6 | 88.7 |