HyperAI
HyperAI超神経
ホーム
プラットフォーム
ドキュメント
ニュース
論文
チュートリアル
データセット
百科事典
SOTA
LLMモデル
GPU ランキング
学会
検索
サイトについて
利用規約
プライバシーポリシー
日本語
HyperAI
HyperAI超神経
Toggle Sidebar
サイトを検索…
⌘
K
Command Palette
Search for a command to run...
プラットフォーム
ホーム
SOTA
複数選択肢質問応答(MCQA)
Multiple Choice Question Answering Mcqa On 21
Multiple Choice Question Answering Mcqa On 21
評価指標
Dev Set (Acc-%)
Test Set (Acc-%)
評価結果
このベンチマークにおける各モデルのパフォーマンス結果
Columns
モデル名
Dev Set (Acc-%)
Test Set (Acc-%)
Paper Title
Meditron-70B (CoT + SC)
66.0
-
MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
Codex 5-shot CoT
0.597
0.627
Can large language models reason about medical questions?
VOD (BioLinkBERT)
0.583
0.629
Variational Open-Domain Question Answering
Flan-PaLM (540B, SC)
0.576
-
Large Language Models Encode Clinical Knowledge
Flan-PaLM (540B, Few-shot)
0.565
-
Large Language Models Encode Clinical Knowledge
PaLM (540B, Few-shot)
0.545
-
Large Language Models Encode Clinical Knowledge
Flan-PaLM (540B, CoT)
0.536
-
Large Language Models Encode Clinical Knowledge
GAL 120B (zero-shot)
0.529
-
Galactica: A Large Language Model for Science
Flan-PaLM (62B, Few-shot)
0.462
-
Large Language Models Encode Clinical Knowledge
PaLM (62B, Few-shot)
0.434
-
Large Language Models Encode Clinical Knowledge
PubmedBERT(Gu et al., 2022)
0.40
0.41
MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering
SciBERT (Beltagy et al., 2019)
0.39
0.39
MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering
BioBERT (Lee et al.,2020)
0.38
0.37
MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering
BERT (Devlin et al., 2019)-Base
0.35
0.33
MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering
Flan-PaLM (8B, Few-shot)
0.345
-
Large Language Models Encode Clinical Knowledge
BLOOM (few-shot, k=5)
0.325
-
Galactica: A Large Language Model for Science
OPT (few-shot, k=5)
0.296
-
Galactica: A Large Language Model for Science
PaLM (8B, Few-shot)
0.267
-
Large Language Models Encode Clinical Knowledge
Med-PaLM 2 (ER)
-
0.723
Towards Expert-Level Medical Question Answering with Large Language Models
BioMedGPT-10B
-
0.514
BioMedGPT: Open Multimodal Generative Pre-trained Transformer for BioMedicine
0 of 22 row(s) selected.
Previous
Next
Multiple Choice Question Answering Mcqa On 21 | SOTA | HyperAI超神経