HyperAI

Question Answering

Benchmark List

All benchmarks related to this task

jd-product-question-answer
Best model: PAAG

Metrics

View Details
aristo-kaggle-allen-ai-8th-grade-questions
Best model: Cardal

Metrics

View Details
aviationqa
Best model: KGT5

Metrics

View Details
babi
Best model: STM

Metrics

View Details
bioasq
Best model: BioLinkBERT (large)

Metrics

View Details
blurb
Best model: BioLinkBERT (large)

Metrics

View Details
boolq
Best model: Gemma-7B

Metrics

View Details
casehold
Best model: Custom Legal-BERT

Metrics

View Details
catbabi-lm-mode
Best model: Fast Weight Memory

Metrics

View Details
catbabi
Best model: Fast Weight Memory

Metrics

View Details
chaii-hindi-and-tamil-question-answering
Best model: MuCoT

Metrics

View Details
children-s-book-test
Best model: NSE

Metrics

View Details
clicr
Best model: Gated-Attention Reader

Metrics

View Details
codah
Best model: G-DAUG-Combo + RoBERTa-Large

Metrics

View Details
complex-cronquestions
Best model: SubGTR

Metrics

View Details
complexquestions
Best model: WebQA

Metrics

View Details
complexwebquestions
Best model: TOME-2

Metrics

View Details
conditionalqa
Best model: FiD

Metrics

View Details
copa
Best model: PaLM 540B (finetuned)

Metrics

View Details
coqa
Best model: GPT-3 175B (few-shot, k=32)

Metrics

View Details
drop-test
Best model: QDGAT (ensemble)

Metrics

View Details
duorc
Best model: Vector Database (ChromaDB)

Metrics

View Details
fairytaleqa
Best model: BART fine-tuned on FairytaleQA

Metrics

View Details
finqa
Best model: ELASTIC (RoBERTa-large)

Metrics

View Details
geoquestions1089
Best model: GeoQA2

Metrics

View Details
graphquestions
Best model: ChatGPT

Metrics

View Details
hotpotqa
Best model: Beam Retrieval

Metrics

View Details
hotpotqa-beir
Best model: BM25+CE

Metrics

View Details
hybridqa
Best model: MAFiD

Metrics

View Details
jaquad
Best model: BERT-Japanese

Metrics

View Details
mapeval-api
Best model: Claude-3.5-Sonnet (ReAct)

Metrics

View Details
mathematics
Best model: TP-Transformer

Metrics

View Details
mctest-160
Best model: syntax, frame, coreference, and word embedding features

Metrics

View Details
medmcqa-dev
Best model: MedMobile (3.8B)

Metrics

View Details
medqa-usmle
Best model: DRAGON + BioLinkBERT

Metrics

View Details
metaqa
Best model: T5-small+prolog

Metrics

View Details
mrqa-out-of-domain
Best model: RGX

Metrics

View Details
multirc
Best model: PaLM 540B (finetuned)

Metrics

View Details
multispanqa
Best model: RoBERTa-large Tagger + LIQUID (Ensemble)

Metrics

View Details
narrativeqa
Best model: Masque (NarrativeQA + MS MARCO)

Metrics

View Details
natural-questions
Best model: Atlas (full, Wiki-dec-2018 index)

Metrics

View Details
natural-questions-long
Best model: DensePhrases

Metrics

View Details
naturalqa
Best model: DPR

Metrics

View Details
newsqa
Best model: OpenAI/o3-mini-2025-01-31-high

Metrics

View Details
obqa
Best model: FLAN 137B (zero-shot)

Metrics

View Details
ott-qa
Best model: Fusion Retriever+ETC

Metrics

View Details
peerqa
Best model: GPT-4o-2024-08-06-128k

Metrics

View Details
piqa
Best model: LLaMA 65B (0-shot)

Metrics

View Details
popqa
Best model: SelfRAG-7b

Metrics

View Details
pubchemqa
Best model: BioMedGPT-10B

Metrics

View Details
pubmedqa
Best model: BioGPT-Large(1.5B)

Metrics

View Details
qasent
Best model: Attentive LSTM

Metrics

View Details
qasper
Best model: Longformer Encoder Decoder (base)

Metrics

View Details
quac
Best model: FlowQA (single model)

Metrics

View Details
quora-question-pairs
Best model: DeBERTa (large)

Metrics

View Details
recipeqa
Best model: multimodal+LXMERT+ConstrainedMaxPooling

Metrics

View Details
reclor
Best model: XLNet-large

Metrics

View Details
semevalcqa
Best model: HyperQA

Metrics

View Details
social-iqa
Best model: LLaMA 65B (zero-shot)

Metrics

View Details
sqa3d
Best model: CREMA

Metrics

View Details
squad1-1
Best model: LUKE

Metrics

View Details
squad1-1-dev
Best model: T5-11B

Metrics

View Details
squad2-0-dev
Best model: XLNet (single model)

Metrics

View Details
stepgame
Best model: TP-MANN

Metrics

View Details
story-cloze
Best model: Neo-6B (QA + WS)

Metrics

View Details
storycloze
Best model: BLOOMZ

Metrics

View Details
strategyqa
Best model: PaLM 2 (few-shot, CoT, SC)

Metrics

View Details
swag
Best model: DeBERTaV3large

Metrics

View Details
tat-qa
Best model: TagOp

Metrics

View Details
tempquestions
Best model: QAap

Metrics

View Details
torque
Best model: ECONET

Metrics

View Details
trecqa
Best model: TANDA DeBERTa-V3-Large + ALL

Metrics

View Details
triviaqa
Best model: PaLM 2-L (one-shot)

Metrics

View Details
truthfulqa
Best model: CoA

Metrics

View Details
tweetqa
Best model: ByT5

Metrics

View Details
vnhsge-civic
Best model: Bing Chat

Metrics

View Details
webquestions
Best model: FiE+PAQ

Metrics

View Details
webquestionssp
Best model: ChatGPT

Metrics

View Details
wikihop
Best model: BigBird-etc

Metrics

View Details
wikiqa
Best model: TANDA-RoBERTa (ASNQ, WikiQA)

Metrics

View Details
wikitablequestions
Best model: TabSQLify (col+row)

Metrics

View Details
yahoocqa
Best model: sMIM (1024) +

Metrics

View Details
adversarial-qa

Metrics

View Details
agi-eval

Metrics

View Details
ai2-kaggle-dataset

Metrics

View Details
bamboogle

Metrics

View Details
bbh

Metrics

View Details
chegeka

Metrics

View Details
cnn-daily-mail

Metrics

View Details
coco-visual-question-answering-vqa-real-1

Metrics

View Details
convfinqa

Metrics

View Details
cronquestions

Metrics

View Details
danetqa

Metrics

View Details
drop

Metrics

View Details
efficientqa-dev

Metrics

View Details
efficientqa-test

Metrics

View Details
egotaskqa

Metrics

View Details
fever

Metrics

View Details
fiqa-2018-beir

Metrics

View Details
fquad

Metrics

View Details
friendsqa

Metrics

View Details
hellaswag

Metrics

View Details
kilt-eli5

Metrics

View Details
kqa-pro

Metrics

View Details
mapeval-textual

Metrics

View Details
mctest-500

Metrics

View Details
medturkquad-medical-turkish-question

Metrics

View Details
mmlu

Metrics

View Details
molweni

Metrics

View Details
mrqa-2019

Metrics

View Details
ms-marco

Metrics

View Details
muld-hotpotqa

Metrics

View Details
muld-narrativeqa

Metrics

View Details
multiq

Metrics

View Details
multitq

Metrics

View Details
next-qa-open-ended-videoqa

Metrics

View Details
nq-beir

Metrics

View Details
openbookqa

Metrics

View Details
quality

Metrics

View Details
quasart-t

Metrics

View Details
race

Metrics

View Details
reverb

Metrics

View Details
ruopenbookqa

Metrics

View Details
sberquad

Metrics

View Details
scde

Metrics

View Details
schizzosquad

Metrics

View Details
simplequestions

Metrics

View Details
squad

Metrics

View Details
squad-adversarial

Metrics

View Details
squad-v2

Metrics

View Details
squad2-0

Metrics

View Details
squadshifts-amazon

Metrics

View Details
squadshifts-new-wiki

Metrics

View Details
squadshifts-nyt

Metrics

View Details
squadshifts-reddit

Metrics

View Details
tempqa-wd

Metrics

View Details
timequestions

Metrics

View Details
tiq

Metrics

View Details
uniprotqa

Metrics

View Details
vnhsge-biology

Metrics

View Details
vnhsge-chemistry

Metrics

View Details
vnhsge-english

Metrics

View Details
vnhsge-geography

Metrics

View Details
vnhsge-history

Metrics

View Details
vnhsge-literature

Metrics

View Details
vnhsge-mathematics-1

Metrics

View Details
vnhsge-physics

Metrics

View Details
websrc

Metrics

View Details
wikisql

Metrics

View Details