HyperAI超神経

Question Answering On Truthfulqa

評価指標

EM

評価結果

このベンチマークにおける各モデルのパフォーマンス結果

比較表
モデル名EM
chain-of-action-faithful-and-multimodal67.3
scaling-language-models-methods-analysis-1-
llama-open-and-efficient-foundation-language-1-
truthfulqa-measuring-how-models-mimic-human-
shakti-a-2-5-billion-parameter-small-language-
representation-engineering-a-top-down-
galactica-a-large-language-model-for-science-1-
モデル 8-
galactica-a-large-language-model-for-science-1-
galactica-a-large-language-model-for-science-1-
scaling-language-models-methods-analysis-1-
chain-of-action-faithful-and-multimodal63.3
tree-of-thoughts-deliberate-problem-solving-166.6
scaling-language-models-methods-analysis-1-
truthx-alleviating-hallucinations-by-editing-
galactica-a-large-language-model-for-science-1-
llama-open-and-efficient-foundation-language-1-
truthfulqa-measuring-how-models-mimic-human-
scaling-language-models-methods-analysis-1-
galactica-a-large-language-model-for-science-1-
モデル 21-
モデル 22-
truthfulqa-measuring-how-models-mimic-human-
scaling-language-models-methods-analysis-1-
llama-open-and-efficient-foundation-language-1-
llama-open-and-efficient-foundation-language-1-
gpt-4-technical-report-1-
truthx-alleviating-hallucinations-by-editing-
truthfulqa-measuring-how-models-mimic-human-
galactica-a-large-language-model-for-science-1-
automatic-chain-of-thought-prompting-in-large42.2
scaling-language-models-methods-analysis-1-
representation-engineering-a-top-down-