HyperAI
HyperAI
Home
Console
Docs
News
Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
Terms of Service
Privacy Policy
English
HyperAI
HyperAI
Toggle Sidebar
Search the site…
⌘
K
Command Palette
Search for a command to run...
Console
Home
SOTA
Multiple Choice Question Answering (MCQA)
Multiple Choice Question Answering Mcqa On 21
Multiple Choice Question Answering Mcqa On 21
Metrics
Dev Set (Acc-%)
Test Set (Acc-%)
Results
Performance results of various models on this benchmark
Columns
Model Name
Dev Set (Acc-%)
Test Set (Acc-%)
Paper Title
Meditron-70B (CoT + SC)
66.0
-
MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
Codex 5-shot CoT
0.597
0.627
Can large language models reason about medical questions?
VOD (BioLinkBERT)
0.583
0.629
Variational Open-Domain Question Answering
Flan-PaLM (540B, SC)
0.576
-
Large Language Models Encode Clinical Knowledge
Flan-PaLM (540B, Few-shot)
0.565
-
Large Language Models Encode Clinical Knowledge
PaLM (540B, Few-shot)
0.545
-
Large Language Models Encode Clinical Knowledge
Flan-PaLM (540B, CoT)
0.536
-
Large Language Models Encode Clinical Knowledge
GAL 120B (zero-shot)
0.529
-
Galactica: A Large Language Model for Science
Flan-PaLM (62B, Few-shot)
0.462
-
Large Language Models Encode Clinical Knowledge
PaLM (62B, Few-shot)
0.434
-
Large Language Models Encode Clinical Knowledge
PubmedBERT(Gu et al., 2022)
0.40
0.41
MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering
SciBERT (Beltagy et al., 2019)
0.39
0.39
MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering
BioBERT (Lee et al.,2020)
0.38
0.37
MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering
BERT (Devlin et al., 2019)-Base
0.35
0.33
MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering
Flan-PaLM (8B, Few-shot)
0.345
-
Large Language Models Encode Clinical Knowledge
BLOOM (few-shot, k=5)
0.325
-
Galactica: A Large Language Model for Science
OPT (few-shot, k=5)
0.296
-
Galactica: A Large Language Model for Science
PaLM (8B, Few-shot)
0.267
-
Large Language Models Encode Clinical Knowledge
Med-PaLM 2 (ER)
-
0.723
Towards Expert-Level Medical Question Answering with Large Language Models
BioMedGPT-10B
-
0.514
BioMedGPT: Open Multimodal Generative Pre-trained Transformer for BioMedicine
0 of 22 row(s) selected.
Previous
Next
Multiple Choice Question Answering Mcqa On 21 | SOTA | HyperAI