Question Answering On Truthfulqa
평가 지표
EM
평가 결과
이 벤치마크에서 각 모델의 성능 결과
비교 표
모델 이름 | EM |
---|---|
chain-of-action-faithful-and-multimodal | 67.3 |
scaling-language-models-methods-analysis-1 | - |
llama-open-and-efficient-foundation-language-1 | - |
truthfulqa-measuring-how-models-mimic-human | - |
shakti-a-2-5-billion-parameter-small-language | - |
representation-engineering-a-top-down | - |
galactica-a-large-language-model-for-science-1 | - |
모델 8 | - |
galactica-a-large-language-model-for-science-1 | - |
galactica-a-large-language-model-for-science-1 | - |
scaling-language-models-methods-analysis-1 | - |
chain-of-action-faithful-and-multimodal | 63.3 |
tree-of-thoughts-deliberate-problem-solving-1 | 66.6 |
scaling-language-models-methods-analysis-1 | - |
truthx-alleviating-hallucinations-by-editing | - |
galactica-a-large-language-model-for-science-1 | - |
llama-open-and-efficient-foundation-language-1 | - |
truthfulqa-measuring-how-models-mimic-human | - |
scaling-language-models-methods-analysis-1 | - |
galactica-a-large-language-model-for-science-1 | - |
모델 21 | - |
모델 22 | - |
truthfulqa-measuring-how-models-mimic-human | - |
scaling-language-models-methods-analysis-1 | - |
llama-open-and-efficient-foundation-language-1 | - |
llama-open-and-efficient-foundation-language-1 | - |
gpt-4-technical-report-1 | - |
truthx-alleviating-hallucinations-by-editing | - |
truthfulqa-measuring-how-models-mimic-human | - |
galactica-a-large-language-model-for-science-1 | - |
automatic-chain-of-thought-prompting-in-large | 42.2 |
scaling-language-models-methods-analysis-1 | - |
representation-engineering-a-top-down | - |