HyperAI
HyperAI초신경
홈
플랫폼
문서
뉴스
연구 논문
튜토리얼
데이터셋
백과사전
SOTA
LLM 모델
GPU 랭킹
컨퍼런스
전체 검색
소개
한국어
HyperAI
HyperAI초신경
Toggle sidebar
전체 사이트 검색...
⌘
K
Command Palette
Search for a command to run...
홈
SOTA
질문 응답
Question Answering On Pubmedqa
Question Answering On Pubmedqa
평가 지표
Accuracy
평가 결과
이 벤치마크에서 각 모델의 성능 결과
Columns
모델 이름
Accuracy
Paper Title
Repository
Meditron-70B (CoT + SC)
81.6
MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
BioGPT-Large(1.5B)
81.0
BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining
RankRAG-llama3-70B (Zero-Shot)
79.8
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs
-
Med-PaLM 2 (5-shot)
79.2
Towards Expert-Level Medical Question Answering with Large Language Models
Flan-PaLM (540B, Few-shot)
79
Large Language Models Encode Clinical Knowledge
BioGPT(345M)
78.2
BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining
Codex 5-shot CoT
78.2
Can large language models reason about medical questions?
Human Performance (single annotator)
78.0
PubMedQA: A Dataset for Biomedical Research Question Answering
GAL 120B (zero-shot)
77.6
Galactica: A Large Language Model for Science
Flan-PaLM (62B, Few-shot)
77.2
Large Language Models Encode Clinical Knowledge
MediSwift-XL
76.8
MediSwift: Efficient Sparse Pre-trained Biomedical Language Models
-
Flan-T5-XXL
76.80
Evaluation of large language model performance on the Biomedical Language Understanding and Reasoning Benchmark
-
BioMedGPT-10B
76.1
BioMedGPT: Open Multimodal Generative Pre-trained Transformer for BioMedicine
Claude 3 Opus (5-shot)
75.8
The Claude 3 Model Family: Opus, Sonnet, Haiku
-
Flan-PaLM (540B, SC)
75.2
Large Language Models Encode Clinical Knowledge
Med-PaLM 2 (ER)
75.0
Towards Expert-Level Medical Question Answering with Large Language Models
Claude 3 Opus (zero-shot)
74.9
The Claude 3 Model Family: Opus, Sonnet, Haiku
-
Med-PaLM 2 (CoT + SC)
74.0
Towards Expert-Level Medical Question Answering with Large Language Models
BLOOM (zero-shot)
73.6
Galactica: A Large Language Model for Science
CoT-T5-11B (1024 Shot)
73.42
The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning
0 of 29 row(s) selected.
Previous
Next