Question Answering On Drop Test
Métriques
F1
Résultats
Résultats de performance de divers modèles sur ce benchmark
Tableau comparatif
Nom du modèle | F1 |
---|---|
question-directed-graph-attention-network-for | 88.38 |
palm-2-technical-report-1 | 85.0 |
language-models-are-few-shot-learners | 36.5 |
gpt-4-technical-report-1 | 80.9 |
neural-symbolic-reader-scalable-integration | 81.71 |
numnet-machine-reading-comprehension-with | 67.97 |
orca-2-teaching-small-language-models-how-to | 60.26 |
gpt-4-technical-report-1 | 64.1 |
orca-2-teaching-small-language-models-how-to | 57.97 |
reasoning-like-program-executors-1 | 87.6 |
drop-a-reading-comprehension-benchmark | 32.7 |
giving-bert-a-calculator-finding-operations | 81.78 |
drop-a-reading-comprehension-benchmark | 47.01 |
injecting-numerical-reasoning-skills-into | 72.4 |
a-multi-type-multi-span-network-for-reading | 79.88 |
tag-based-multi-span-extraction-in-reading | 80.7 |