HyperAI
Accueil
Actualités
Articles de recherche récents
Tutoriels
Ensembles de données
Wiki
SOTA
Modèles LLM
Classement GPU
Événements
Recherche
À propos
Français
HyperAI
Toggle sidebar
Rechercher sur le site...
⌘
K
Accueil
SOTA
Réponse à des questions
Question Answering On Newsqa
Question Answering On Newsqa
Métriques
EM
F1
Résultats
Résultats de performance de divers modèles sur ce benchmark
Columns
Nom du modèle
EM
F1
Paper Title
Repository
deepseek-r1
80.57
86.13
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
OpenAI/GPT-4o
70.21
81.74
GPT-4o as the Gold Standard: A Scalable and General Purpose Approach to Filter Language Model Pretraining Data
-
DecaProp
53.1
66.3
Densely Connected Attention Propagation for Reading Comprehension
FastQAExt
43.7
56.1
Making Neural QA as Simple as Possible but not Simpler
Riple/Saanvi-v0.1
72.61
85.44
Time-series Transformer Generative Adversarial Networks
LinkBERT (large)
-
72.6
LinkBERT: Pretraining Language Models with Document Links
BERT+ASGen
54.7
64.5
-
-
Anthropic/claude-3-5-sonnet
74.23
82.3
Claude 3.5 Sonnet Model Card Addendum
-
xAI/grok-2-1212
70.57
88.24
XAI for Transformers: Better Explanations through Conservative Propagation
OpenAI/o1-2024-12-17-high
81.44
88.7
0/1 Deep Neural Networks via Block Coordinate Descent
-
Google/Gemini 1.5 Flash
68.75
79.91
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
AMANDA
48.4
63.7
A Question-Focused Multi-Factor Attention Network for Question Answering
OpenAI/o3-mini-2025-01-31-high
96.52
92.13
o3-mini vs DeepSeek-R1: Which One is Safer?
DyREX
-
68.53
DyREx: Dynamic Query Representation for Extractive Question Answering
MINIMAL(Dyn)
50.1
63.2
Efficient and Robust Question Answering from Minimal Context over Documents
SpanBERT
-
73.6
SpanBERT: Improving Pre-training by Representing and Predicting Spans
0 of 16 row(s) selected.
Previous
Next