HyperAI超神经

Question Answering On Newsqa

评估指标

EM
F1

评测结果

各个模型在此基准测试上的表现结果

比较表格
模型名称EMF1
deepseek-r1-incentivizing-reasoning80.5786.13
sieve-general-purpose-data-filtering-system70.2181.74
densely-connected-attention-propagation-for53.166.3
making-neural-qa-as-simple-as-possible-but43.756.1
time-series-transformer-generative72.6185.44
linkbert-pretraining-language-models-with-72.6
learning-to-generate-questions-by-learning-to54.764.5
claude-3-5-sonnet-model-card-addendum74.2382.3
xai-for-transformers-better-explanations70.5788.24
0-1-deep-neural-networks-via-block-coordinate81.4488.7
gemini-1-5-unlocking-multimodal-understanding68.7579.91
a-question-focused-multi-factor-attention48.463.7
o3-mini-vs-deepseek-r1-which-one-is-safer96.5292.13
dyrex-dynamic-query-representation-for-68.53
efficient-and-robust-question-answering-from50.163.2
spanbert-improving-pre-training-by-73.6