HyperAI

Question Answering On Truthfulqa

Métriques

EM

Résultats

Résultats de performance de divers modèles sur ce benchmark

Tableau comparatif
Nom du modèleEM
chain-of-action-faithful-and-multimodal67.3
scaling-language-models-methods-analysis-1-
llama-open-and-efficient-foundation-language-1-
truthfulqa-measuring-how-models-mimic-human-
shakti-a-2-5-billion-parameter-small-language-
representation-engineering-a-top-down-
galactica-a-large-language-model-for-science-1-
Modèle 8-
galactica-a-large-language-model-for-science-1-
galactica-a-large-language-model-for-science-1-
scaling-language-models-methods-analysis-1-
chain-of-action-faithful-and-multimodal63.3
tree-of-thoughts-deliberate-problem-solving-166.6
scaling-language-models-methods-analysis-1-
truthx-alleviating-hallucinations-by-editing-
galactica-a-large-language-model-for-science-1-
llama-open-and-efficient-foundation-language-1-
truthfulqa-measuring-how-models-mimic-human-
scaling-language-models-methods-analysis-1-
galactica-a-large-language-model-for-science-1-
Modèle 21-
Modèle 22-
truthfulqa-measuring-how-models-mimic-human-
scaling-language-models-methods-analysis-1-
llama-open-and-efficient-foundation-language-1-
llama-open-and-efficient-foundation-language-1-
gpt-4-technical-report-1-
truthx-alleviating-hallucinations-by-editing-
truthfulqa-measuring-how-models-mimic-human-
galactica-a-large-language-model-for-science-1-
automatic-chain-of-thought-prompting-in-large42.2
scaling-language-models-methods-analysis-1-
representation-engineering-a-top-down-