HyperAI超神経

Multiple Choice Question Answering Mcqa On 21

評価指標

Dev Set (Acc-%)
Test Set (Acc-%)

評価結果

このベンチマークにおける各モデルのパフォーマンス結果

比較表
モデル名Dev Set (Acc-%)Test Set (Acc-%)
variational-open-domain-question-answering0.5830.629
large-language-models-encode-clinical0.345-
meditron-70b-scaling-medical-pretraining-for66.0-
towards-expert-level-medical-question-0.723
large-language-models-encode-clinical0.545-
large-language-models-encode-clinical0.536-
biomedgpt-open-multimodal-generative-pre-0.514
galactica-a-large-language-model-for-science-10.296-
medmcqa-a-large-scale-multi-subject-multi0.390.39
can-large-language-models-reason-about0.5970.627
large-language-models-encode-clinical0.565-
galactica-a-large-language-model-for-science-10.529-
large-language-models-encode-clinical0.267-
large-language-models-encode-clinical0.576-
medmcqa-a-large-scale-multi-subject-multi0.380.37
large-language-models-encode-clinical0.434-
galactica-a-large-language-model-for-science-10.325-
towards-expert-level-medical-question-0.715
medmcqa-a-large-scale-multi-subject-multi0.350.33
large-language-models-encode-clinical0.462-
towards-expert-level-medical-question-0.713
medmcqa-a-large-scale-multi-subject-multi0.400.41