HyperAI

Multiple Choice Question Answering Mcqa On 21

Metrics

Dev Set (Acc-%)
Test Set (Acc-%)

Results

Performance results of various models on this benchmark

Model Name
Dev Set (Acc-%)
Test Set (Acc-%)
Paper TitleRepository
VOD (BioLinkBERT)0.5830.629Variational Open-Domain Question Answering
Flan-PaLM (8B, Few-shot)0.345-Large Language Models Encode Clinical Knowledge-
Meditron-70B (CoT + SC)66.0-MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
Med-PaLM 2 (ER)-0.723Towards Expert-Level Medical Question Answering with Large Language Models
PaLM (540B, Few-shot)0.545-Large Language Models Encode Clinical Knowledge-
Flan-PaLM (540B, CoT)0.536-Large Language Models Encode Clinical Knowledge-
BioMedGPT-10B-0.514BioMedGPT: Open Multimodal Generative Pre-trained Transformer for BioMedicine
OPT (few-shot, k=5)0.296-Galactica: A Large Language Model for Science
SciBERT (Beltagy et al., 2019)0.390.39MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering
Codex 5-shot CoT0.5970.627Can large language models reason about medical questions?
Flan-PaLM (540B, Few-shot)0.565-Large Language Models Encode Clinical Knowledge-
GAL 120B (zero-shot)0.529-Galactica: A Large Language Model for Science
PaLM (8B, Few-shot)0.267-Large Language Models Encode Clinical Knowledge-
Flan-PaLM (540B, SC)0.576-Large Language Models Encode Clinical Knowledge-
BioBERT (Lee et al.,2020)0.380.37MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering
PaLM (62B, Few-shot)0.434-Large Language Models Encode Clinical Knowledge-
BLOOM (few-shot, k=5)0.325-Galactica: A Large Language Model for Science
Med-PaLM 2 (CoT+SC)-0.715Towards Expert-Level Medical Question Answering with Large Language Models
BERT (Devlin et al., 2019)-Base0.350.33MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering
Flan-PaLM (62B, Few-shot)0.462-Large Language Models Encode Clinical Knowledge-
0 of 22 row(s) selected.