HyperAI
HyperAI超神経
ホーム
ニュース
論文
チュートリアル
データセット
百科事典
SOTA
LLMモデル
GPU ランキング
学会
検索
サイトについて
日本語
HyperAI
HyperAI超神経
Toggle sidebar
サイトを検索…
⌘
K
サイトを検索…
⌘
K
ホーム
SOTA
質問応答
Question Answering On Newsqa
Question Answering On Newsqa
評価指標
EM
F1
評価結果
このベンチマークにおける各モデルのパフォーマンス結果
Columns
モデル名
EM
F1
Paper Title
Repository
deepseek-r1
80.57
86.13
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
-
OpenAI/GPT-4o
70.21
81.74
GPT-4o as the Gold Standard: A Scalable and General Purpose Approach to Filter Language Model Pretraining Data
-
DecaProp
53.1
66.3
Densely Connected Attention Propagation for Reading Comprehension
-
FastQAExt
43.7
56.1
Making Neural QA as Simple as Possible but not Simpler
-
Riple/Saanvi-v0.1
72.61
85.44
Time-series Transformer Generative Adversarial Networks
-
LinkBERT (large)
-
72.6
LinkBERT: Pretraining Language Models with Document Links
-
BERT+ASGen
54.7
64.5
-
-
Anthropic/claude-3-5-sonnet
74.23
82.3
Claude 3.5 Sonnet Model Card Addendum
-
xAI/grok-2-1212
70.57
88.24
XAI for Transformers: Better Explanations through Conservative Propagation
-
OpenAI/o1-2024-12-17-high
81.44
88.7
0/1 Deep Neural Networks via Block Coordinate Descent
-
Google/Gemini 1.5 Flash
68.75
79.91
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
-
AMANDA
48.4
63.7
A Question-Focused Multi-Factor Attention Network for Question Answering
-
OpenAI/o3-mini-2025-01-31-high
96.52
92.13
o3-mini vs DeepSeek-R1: Which One is Safer?
-
DyREX
-
68.53
DyREx: Dynamic Query Representation for Extractive Question Answering
-
MINIMAL(Dyn)
50.1
63.2
Efficient and Robust Question Answering from Minimal Context over Documents
-
SpanBERT
-
73.6
SpanBERT: Improving Pre-training by Representing and Predicting Spans
-
0 of 16 row(s) selected.
Previous
Next