HyperAI
HyperAI초신경
홈
플랫폼
문서
뉴스
연구 논문
튜토리얼
데이터셋
백과사전
SOTA
LLM 모델
GPU 랭킹
컨퍼런스
전체 검색
소개
서비스 약관
개인정보 처리방침
한국어
HyperAI
HyperAI초신경
Toggle Sidebar
전체 사이트 검색...
⌘
K
Command Palette
Search for a command to run...
플랫폼
홈
SOTA
다중 선택 질문 응답
Multiple Choice Question Answering Mcqa On 21
Multiple Choice Question Answering Mcqa On 21
평가 지표
Dev Set (Acc-%)
Test Set (Acc-%)
평가 결과
이 벤치마크에서 각 모델의 성능 결과
Columns
모델 이름
Dev Set (Acc-%)
Test Set (Acc-%)
Paper Title
Meditron-70B (CoT + SC)
66.0
-
MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
Codex 5-shot CoT
0.597
0.627
Can large language models reason about medical questions?
VOD (BioLinkBERT)
0.583
0.629
Variational Open-Domain Question Answering
Flan-PaLM (540B, SC)
0.576
-
Large Language Models Encode Clinical Knowledge
Flan-PaLM (540B, Few-shot)
0.565
-
Large Language Models Encode Clinical Knowledge
PaLM (540B, Few-shot)
0.545
-
Large Language Models Encode Clinical Knowledge
Flan-PaLM (540B, CoT)
0.536
-
Large Language Models Encode Clinical Knowledge
GAL 120B (zero-shot)
0.529
-
Galactica: A Large Language Model for Science
Flan-PaLM (62B, Few-shot)
0.462
-
Large Language Models Encode Clinical Knowledge
PaLM (62B, Few-shot)
0.434
-
Large Language Models Encode Clinical Knowledge
PubmedBERT(Gu et al., 2022)
0.40
0.41
MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering
SciBERT (Beltagy et al., 2019)
0.39
0.39
MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering
BioBERT (Lee et al.,2020)
0.38
0.37
MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering
BERT (Devlin et al., 2019)-Base
0.35
0.33
MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering
Flan-PaLM (8B, Few-shot)
0.345
-
Large Language Models Encode Clinical Knowledge
BLOOM (few-shot, k=5)
0.325
-
Galactica: A Large Language Model for Science
OPT (few-shot, k=5)
0.296
-
Galactica: A Large Language Model for Science
PaLM (8B, Few-shot)
0.267
-
Large Language Models Encode Clinical Knowledge
Med-PaLM 2 (ER)
-
0.723
Towards Expert-Level Medical Question Answering with Large Language Models
BioMedGPT-10B
-
0.514
BioMedGPT: Open Multimodal Generative Pre-trained Transformer for BioMedicine
0 of 22 row(s) selected.
Previous
Next
Multiple Choice Question Answering Mcqa On 21 | SOTA | HyperAI초신경