Question Answering On Newsqa
Metriken
EM
F1
Ergebnisse
Leistungsergebnisse verschiedener Modelle zu diesem Benchmark
Vergleichstabelle
Modellname | EM | F1 |
---|---|---|
deepseek-r1-incentivizing-reasoning | 80.57 | 86.13 |
sieve-general-purpose-data-filtering-system | 70.21 | 81.74 |
densely-connected-attention-propagation-for | 53.1 | 66.3 |
making-neural-qa-as-simple-as-possible-but | 43.7 | 56.1 |
time-series-transformer-generative | 72.61 | 85.44 |
linkbert-pretraining-language-models-with | - | 72.6 |
learning-to-generate-questions-by-learning-to | 54.7 | 64.5 |
claude-3-5-sonnet-model-card-addendum | 74.23 | 82.3 |
xai-for-transformers-better-explanations | 70.57 | 88.24 |
0-1-deep-neural-networks-via-block-coordinate | 81.44 | 88.7 |
gemini-1-5-unlocking-multimodal-understanding | 68.75 | 79.91 |
a-question-focused-multi-factor-attention | 48.4 | 63.7 |
o3-mini-vs-deepseek-r1-which-one-is-safer | 96.52 | 92.13 |
dyrex-dynamic-query-representation-for | - | 68.53 |
efficient-and-robust-question-answering-from | 50.1 | 63.2 |
spanbert-improving-pre-training-by | - | 73.6 |