HyperAI超神経
ホーム
ニュース
最新論文
チュートリアル
データセット
百科事典
SOTA
LLMモデル
GPU ランキング
学会
検索
サイトについて
日本語
HyperAI超神経
Toggle sidebar
サイトを検索…
⌘
K
ホーム
SOTA
質問応答
Question Answering On Newsqa
Question Answering On Newsqa
評価指標
EM
F1
評価結果
このベンチマークにおける各モデルのパフォーマンス結果
Columns
モデル名
EM
F1
Paper Title
Repository
deepseek-r1
80.57
86.13
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
OpenAI/GPT-4o
70.21
81.74
GPT-4o as the Gold Standard: A Scalable and General Purpose Approach to Filter Language Model Pretraining Data
-
DecaProp
53.1
66.3
Densely Connected Attention Propagation for Reading Comprehension
FastQAExt
43.7
56.1
Making Neural QA as Simple as Possible but not Simpler
Riple/Saanvi-v0.1
72.61
85.44
Time-series Transformer Generative Adversarial Networks
LinkBERT (large)
-
72.6
LinkBERT: Pretraining Language Models with Document Links
BERT+ASGen
54.7
64.5
-
-
Anthropic/claude-3-5-sonnet
74.23
82.3
Claude 3.5 Sonnet Model Card Addendum
-
xAI/grok-2-1212
70.57
88.24
XAI for Transformers: Better Explanations through Conservative Propagation
OpenAI/o1-2024-12-17-high
81.44
88.7
0/1 Deep Neural Networks via Block Coordinate Descent
-
Google/Gemini 1.5 Flash
68.75
79.91
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
AMANDA
48.4
63.7
A Question-Focused Multi-Factor Attention Network for Question Answering
OpenAI/o3-mini-2025-01-31-high
96.52
92.13
o3-mini vs DeepSeek-R1: Which One is Safer?
DyREX
-
68.53
DyREx: Dynamic Query Representation for Extractive Question Answering
MINIMAL(Dyn)
50.1
63.2
Efficient and Robust Question Answering from Minimal Context over Documents
SpanBERT
-
73.6
SpanBERT: Improving Pre-training by Representing and Predicting Spans
0 of 16 row(s) selected.
Previous
Next