Question Answering On Truthfulqa
評価指標
EM
評価結果
このベンチマークにおける各モデルのパフォーマンス結果
比較表
モデル名 | EM |
---|---|
chain-of-action-faithful-and-multimodal | 67.3 |
scaling-language-models-methods-analysis-1 | - |
llama-open-and-efficient-foundation-language-1 | - |
truthfulqa-measuring-how-models-mimic-human | - |
shakti-a-2-5-billion-parameter-small-language | - |
representation-engineering-a-top-down | - |
galactica-a-large-language-model-for-science-1 | - |
モデル 8 | - |
galactica-a-large-language-model-for-science-1 | - |
galactica-a-large-language-model-for-science-1 | - |
scaling-language-models-methods-analysis-1 | - |
chain-of-action-faithful-and-multimodal | 63.3 |
tree-of-thoughts-deliberate-problem-solving-1 | 66.6 |
scaling-language-models-methods-analysis-1 | - |
truthx-alleviating-hallucinations-by-editing | - |
galactica-a-large-language-model-for-science-1 | - |
llama-open-and-efficient-foundation-language-1 | - |
truthfulqa-measuring-how-models-mimic-human | - |
scaling-language-models-methods-analysis-1 | - |
galactica-a-large-language-model-for-science-1 | - |
モデル 21 | - |
モデル 22 | - |
truthfulqa-measuring-how-models-mimic-human | - |
scaling-language-models-methods-analysis-1 | - |
llama-open-and-efficient-foundation-language-1 | - |
llama-open-and-efficient-foundation-language-1 | - |
gpt-4-technical-report-1 | - |
truthx-alleviating-hallucinations-by-editing | - |
truthfulqa-measuring-how-models-mimic-human | - |
galactica-a-large-language-model-for-science-1 | - |
automatic-chain-of-thought-prompting-in-large | 42.2 |
scaling-language-models-methods-analysis-1 | - |
representation-engineering-a-top-down | - |