Question Answering
Benchmark List
All benchmarks related to this task
jd-product-question-answer
Best model: PAAG
Metrics
View Details
aristo-kaggle-allen-ai-8th-grade-questions
Best model: Cardal
Metrics
View Details
aviationqa
Best model: KGT5
Metrics
View Details
babi
Best model: STM
Metrics
View Details
bioasq
Best model: BioLinkBERT (large)
Metrics
View Details
blurb
Best model: BioLinkBERT (large)
Metrics
View Details
boolq
Best model: Gemma-7B
Metrics
View Details
casehold
Best model: Custom Legal-BERT
Metrics
View Details
catbabi-lm-mode
Best model: Fast Weight Memory
Metrics
View Details
catbabi
Best model: Fast Weight Memory
Metrics
View Details
chaii-hindi-and-tamil-question-answering
Best model: MuCoT
Metrics
View Details
children-s-book-test
Best model: NSE
Metrics
View Details
clicr
Best model: Gated-Attention Reader
Metrics
View Details
codah
Best model: G-DAUG-Combo + RoBERTa-Large
Metrics
View Details
complex-cronquestions
Best model: SubGTR
Metrics
View Details
complexquestions
Best model: WebQA
Metrics
View Details
complexwebquestions
Best model: TOME-2
Metrics
View Details
conditionalqa
Best model: FiD
Metrics
View Details
copa
Best model: PaLM 540B (finetuned)
Metrics
View Details
coqa
Best model: GPT-3 175B (few-shot, k=32)
Metrics
View Details
drop-test
Best model: QDGAT (ensemble)
Metrics
View Details
duorc
Best model: Vector Database (ChromaDB)
Metrics
View Details
fairytaleqa
Best model: BART fine-tuned on FairytaleQA
Metrics
View Details
finqa
Best model: ELASTIC (RoBERTa-large)
Metrics
View Details
geoquestions1089
Best model: GeoQA2
Metrics
View Details
graphquestions
Best model: ChatGPT
Metrics
View Details
hotpotqa
Best model: Beam Retrieval
Metrics
View Details
hotpotqa-beir
Best model: BM25+CE
Metrics
View Details
hybridqa
Best model: MAFiD
Metrics
View Details
jaquad
Best model: BERT-Japanese
Metrics
View Details
mapeval-api
Best model: Claude-3.5-Sonnet (ReAct)
Metrics
View Details
mathematics
Best model: TP-Transformer
Metrics
View Details
mctest-160
Best model: syntax, frame, coreference, and word embedding features
Metrics
View Details
medmcqa-dev
Best model: MedMobile (3.8B)
Metrics
View Details
medqa-usmle
Best model: DRAGON + BioLinkBERT
Metrics
View Details
metaqa
Best model: T5-small+prolog
Metrics
View Details
mrqa-out-of-domain
Best model: RGX
Metrics
View Details
multirc
Best model: PaLM 540B (finetuned)
Metrics
View Details
multispanqa
Best model: RoBERTa-large Tagger + LIQUID (Ensemble)
Metrics
View Details
narrativeqa
Best model: Masque (NarrativeQA + MS MARCO)
Metrics
View Details
natural-questions
Best model: Atlas (full, Wiki-dec-2018 index)
Metrics
View Details
natural-questions-long
Best model: DensePhrases
Metrics
View Details
naturalqa
Best model: DPR
Metrics
View Details
newsqa
Best model: OpenAI/o3-mini-2025-01-31-high
Metrics
View Details
obqa
Best model: FLAN 137B (zero-shot)
Metrics
View Details
ott-qa
Best model: Fusion Retriever+ETC
Metrics
View Details
peerqa
Best model: GPT-4o-2024-08-06-128k
Metrics
View Details
piqa
Best model: LLaMA 65B (0-shot)
Metrics
View Details
popqa
Best model: SelfRAG-7b
Metrics
View Details
pubchemqa
Best model: BioMedGPT-10B
Metrics
View Details
pubmedqa
Best model: BioGPT-Large(1.5B)
Metrics
View Details
qasent
Best model: Attentive LSTM
Metrics
View Details
qasper
Best model: Longformer Encoder Decoder (base)
Metrics
View Details
quac
Best model: FlowQA (single model)
Metrics
View Details
quora-question-pairs
Best model: DeBERTa (large)
Metrics
View Details
recipeqa
Best model: multimodal+LXMERT+ConstrainedMaxPooling
Metrics
View Details
reclor
Best model: XLNet-large
Metrics
View Details
semevalcqa
Best model: HyperQA
Metrics
View Details
social-iqa
Best model: LLaMA 65B (zero-shot)
Metrics
View Details
sqa3d
Best model: CREMA
Metrics
View Details
squad1-1
Best model: LUKE
Metrics
View Details
squad1-1-dev
Best model: T5-11B
Metrics
View Details
squad2-0-dev
Best model: XLNet (single model)
Metrics
View Details
stepgame
Best model: TP-MANN
Metrics
View Details
story-cloze
Best model: Neo-6B (QA + WS)
Metrics
View Details
storycloze
Best model: BLOOMZ
Metrics
View Details
strategyqa
Best model: PaLM 2 (few-shot, CoT, SC)
Metrics
View Details
swag
Best model: DeBERTaV3large
Metrics
View Details
tat-qa
Best model: TagOp
Metrics
View Details
tempquestions
Best model: QAap
Metrics
View Details
torque
Best model: ECONET
Metrics
View Details
trecqa
Best model: TANDA DeBERTa-V3-Large + ALL
Metrics
View Details
triviaqa
Best model: PaLM 2-L (one-shot)
Metrics
View Details
truthfulqa
Best model: CoA
Metrics
View Details
tweetqa
Best model: ByT5
Metrics
View Details
vnhsge-civic
Best model: Bing Chat
Metrics
View Details
webquestions
Best model: FiE+PAQ
Metrics
View Details
webquestionssp
Best model: ChatGPT
Metrics
View Details
wikihop
Best model: BigBird-etc
Metrics
View Details
wikiqa
Best model: TANDA-RoBERTa (ASNQ, WikiQA)
Metrics
View Details
wikitablequestions
Best model: TabSQLify (col+row)
Metrics
View Details
yahoocqa
Best model: sMIM (1024) +
Metrics
View Details
adversarial-qa
Metrics
View Details
agi-eval
Metrics
View Details
ai2-kaggle-dataset
Metrics
View Details
bamboogle
Metrics
View Details
bbh
Metrics
View Details
chegeka
Metrics
View Details
cnn-daily-mail
Metrics
View Details
coco-visual-question-answering-vqa-real-1
Metrics
View Details
convfinqa
Metrics
View Details
cronquestions
Metrics
View Details
danetqa
Metrics
View Details
drop
Metrics
View Details
efficientqa-dev
Metrics
View Details
efficientqa-test
Metrics
View Details
egotaskqa
Metrics
View Details
fever
Metrics
View Details
fiqa-2018-beir
Metrics
View Details
fquad
Metrics
View Details
friendsqa
Metrics
View Details
hellaswag
Metrics
View Details
kilt-eli5
Metrics
View Details
kqa-pro
Metrics
View Details
mapeval-textual
Metrics
View Details
mctest-500
Metrics
View Details
medturkquad-medical-turkish-question
Metrics
View Details
mmlu
Metrics
View Details
molweni
Metrics
View Details
mrqa-2019
Metrics
View Details
ms-marco
Metrics
View Details
muld-hotpotqa
Metrics
View Details
muld-narrativeqa
Metrics
View Details
multiq
Metrics
View Details
multitq
Metrics
View Details
next-qa-open-ended-videoqa
Metrics
View Details
nq-beir
Metrics
View Details
openbookqa
Metrics
View Details
quality
Metrics
View Details
quasart-t
Metrics
View Details
race
Metrics
View Details
reverb
Metrics
View Details
ruopenbookqa
Metrics
View Details
sberquad
Metrics
View Details
scde
Metrics
View Details
schizzosquad
Metrics
View Details
simplequestions
Metrics
View Details
squad
Metrics
View Details
squad-adversarial
Metrics
View Details
squad-v2
Metrics
View Details
squad2-0
Metrics
View Details
squadshifts-amazon
Metrics
View Details
squadshifts-new-wiki
Metrics
View Details
squadshifts-nyt
Metrics
View Details
squadshifts-reddit
Metrics
View Details
tempqa-wd
Metrics
View Details
timequestions
Metrics
View Details
tiq
Metrics
View Details
uniprotqa
Metrics
View Details
vnhsge-biology
Metrics
View Details
vnhsge-chemistry
Metrics
View Details
vnhsge-english
Metrics
View Details
vnhsge-geography
Metrics
View Details
vnhsge-history
Metrics
View Details
vnhsge-literature
Metrics
View Details
vnhsge-mathematics-1
Metrics
View Details
vnhsge-physics
Metrics
View Details
websrc
Metrics
View Details
wikisql
Metrics
View Details