HyperAI초신경

Question Answering On Truthfulqa

평가 지표

EM

평가 결과

이 벤치마크에서 각 모델의 성능 결과

비교 표
모델 이름EM
chain-of-action-faithful-and-multimodal67.3
scaling-language-models-methods-analysis-1-
llama-open-and-efficient-foundation-language-1-
truthfulqa-measuring-how-models-mimic-human-
shakti-a-2-5-billion-parameter-small-language-
representation-engineering-a-top-down-
galactica-a-large-language-model-for-science-1-
모델 8-
galactica-a-large-language-model-for-science-1-
galactica-a-large-language-model-for-science-1-
scaling-language-models-methods-analysis-1-
chain-of-action-faithful-and-multimodal63.3
tree-of-thoughts-deliberate-problem-solving-166.6
scaling-language-models-methods-analysis-1-
truthx-alleviating-hallucinations-by-editing-
galactica-a-large-language-model-for-science-1-
llama-open-and-efficient-foundation-language-1-
truthfulqa-measuring-how-models-mimic-human-
scaling-language-models-methods-analysis-1-
galactica-a-large-language-model-for-science-1-
모델 21-
모델 22-
truthfulqa-measuring-how-models-mimic-human-
scaling-language-models-methods-analysis-1-
llama-open-and-efficient-foundation-language-1-
llama-open-and-efficient-foundation-language-1-
gpt-4-technical-report-1-
truthx-alleviating-hallucinations-by-editing-
truthfulqa-measuring-how-models-mimic-human-
galactica-a-large-language-model-for-science-1-
automatic-chain-of-thought-prompting-in-large42.2
scaling-language-models-methods-analysis-1-
representation-engineering-a-top-down-