HyperAI초신경

Multiple Choice Question Answering Mcqa On 21

평가 지표

Dev Set (Acc-%)
Test Set (Acc-%)

평가 결과

이 벤치마크에서 각 모델의 성능 결과

비교 표
모델 이름Dev Set (Acc-%)Test Set (Acc-%)
variational-open-domain-question-answering0.5830.629
large-language-models-encode-clinical0.345-
meditron-70b-scaling-medical-pretraining-for66.0-
towards-expert-level-medical-question-0.723
large-language-models-encode-clinical0.545-
large-language-models-encode-clinical0.536-
biomedgpt-open-multimodal-generative-pre-0.514
galactica-a-large-language-model-for-science-10.296-
medmcqa-a-large-scale-multi-subject-multi0.390.39
can-large-language-models-reason-about0.5970.627
large-language-models-encode-clinical0.565-
galactica-a-large-language-model-for-science-10.529-
large-language-models-encode-clinical0.267-
large-language-models-encode-clinical0.576-
medmcqa-a-large-scale-multi-subject-multi0.380.37
large-language-models-encode-clinical0.434-
galactica-a-large-language-model-for-science-10.325-
towards-expert-level-medical-question-0.715
medmcqa-a-large-scale-multi-subject-multi0.350.33
large-language-models-encode-clinical0.462-
towards-expert-level-medical-question-0.713
medmcqa-a-large-scale-multi-subject-multi0.400.41