Multiple Choice Question Answering Mcqa On 21
評価指標
Dev Set (Acc-%)
Test Set (Acc-%)
評価結果
このベンチマークにおける各モデルのパフォーマンス結果
比較表
モデル名 | Dev Set (Acc-%) | Test Set (Acc-%) |
---|---|---|
variational-open-domain-question-answering | 0.583 | 0.629 |
large-language-models-encode-clinical | 0.345 | - |
meditron-70b-scaling-medical-pretraining-for | 66.0 | - |
towards-expert-level-medical-question | - | 0.723 |
large-language-models-encode-clinical | 0.545 | - |
large-language-models-encode-clinical | 0.536 | - |
biomedgpt-open-multimodal-generative-pre | - | 0.514 |
galactica-a-large-language-model-for-science-1 | 0.296 | - |
medmcqa-a-large-scale-multi-subject-multi | 0.39 | 0.39 |
can-large-language-models-reason-about | 0.597 | 0.627 |
large-language-models-encode-clinical | 0.565 | - |
galactica-a-large-language-model-for-science-1 | 0.529 | - |
large-language-models-encode-clinical | 0.267 | - |
large-language-models-encode-clinical | 0.576 | - |
medmcqa-a-large-scale-multi-subject-multi | 0.38 | 0.37 |
large-language-models-encode-clinical | 0.434 | - |
galactica-a-large-language-model-for-science-1 | 0.325 | - |
towards-expert-level-medical-question | - | 0.715 |
medmcqa-a-large-scale-multi-subject-multi | 0.35 | 0.33 |
large-language-models-encode-clinical | 0.462 | - |
towards-expert-level-medical-question | - | 0.713 |
medmcqa-a-large-scale-multi-subject-multi | 0.40 | 0.41 |