HyperAI超神経

Question Answering On Drop

評価指標

Accuracy

評価結果

このベンチマークにおける各モデルのパフォーマンス結果

比較表
モデル名Accuracy
large-language-models-can-self-improve78.2
large-language-models-can-self-improve83
large-language-models-can-self-improve71.7
large-language-models-can-self-improve60
large-language-models-can-self-improve70.6
large-language-models-can-self-improve76.2