Question Answering On Triviaqa
평가 지표
EM
평가 결과
이 벤치마크에서 각 모델의 성능 결과
비교 표
모델 이름 | EM |
---|---|
llama-open-and-efficient-foundation-language-1 | 73.0 |
breaking-the-ceiling-of-the-llm-community-by | 79.29 |
mistral-7b | 69.9 |
end-to-end-training-of-multi-document-reader | 71.4 |
glam-efficient-scaling-of-language-models | 75.8 |
rankrag-unifying-context-ranking-with | 82.9 |
ra-dit-retrieval-augmented-dual-instruction | 75.4 |
search-o1-agentic-search-enhanced-large | - |
llama-open-and-efficient-foundation-language-1 | 71.6 |
linkbert-pretraining-language-models-with | - |
rankrag-unifying-context-ranking-with | 86.5 |
fie-building-a-global-probability-space-by | 72.6 |
reasonbert-pre-trained-to-reason-with-distant | - |
dyrex-dynamic-query-representation-for | - |
reinforced-mnemonic-reader-for-machine | 46.94 |
model-card-and-evaluations-for-claude-models | 87.5 |
big-bird-transformers-for-longer-sequences | - |
gpt-4-technical-report-1 | 84.8 |
palm-2-technical-report-1 | 75.2 |
model-card-and-evaluations-for-claude-models | 78.9 |
chatqa-building-gpt-4-level-conversational-qa | 69.0 |
glam-efficient-scaling-of-language-models | 75.8 |
replug-retrieval-augmented-black-box-language | 76.8 |
understand-what-llm-needs-dual-preference | - |
branch-train-mix-mixing-expert-llms-into-a | 57.1 |
shakti-a-2-5-billion-parameter-small-language | 58.2 |
palm-scaling-language-modeling-with-pathways-1 | 76.9 |
spanbert-improving-pre-training-by | - |
llama-2-open-foundation-and-fine-tuned-chat | 85 |
memen-multi-layer-embedding-with-memory | 43.16 |
rankrag-unifying-context-ranking-with | 72.6 |
unitedqa-a-hybrid-approach-for-open-domain | - |
model-card-and-evaluations-for-claude-models | 86.7 |
palm-scaling-language-modeling-with-pathways-1 | 81.4 |
language-models-are-few-shot-learners | 71.2 |
llama-open-and-efficient-foundation-language-1 | 72.6 |
reasonbert-pre-trained-to-reason-with-distant | - |
chatqa-building-gpt-4-level-conversational-qa | 81.0 |
llama-open-and-efficient-foundation-language-1 | 68.2 |
simple-and-effective-multi-paragraph-reading | 66.37 |
chatqa-building-gpt-4-level-conversational-qa | 85.6 |
replug-retrieval-augmented-black-box-language | 77.3 |
glam-efficient-scaling-of-language-models | 71.3 |
palm-2-technical-report-1 | 81.7 |
distilling-knowledge-from-reader-to-retriever-1 | 72.1 |
모델 46 | 87 |
dense-passage-retrieval-for-open-domain | 56.8 |
finetuned-language-models-are-zero-shot | 56.7 |
dynamic-integration-of-background-knowledge | 50.56 |
leveraging-passage-retrieval-with-generative | 67.6 |
palm-scaling-language-modeling-with-pathways-1 | 81.4 |
mention-memory-incorporating-textual-1 | 65.8 |
190600300 | 45 |
retrieval-augmented-generation-for-knowledge | 56.1 |
memoreader-large-scale-reading-comprehension | 67.21 |
palm-2-technical-report-1 | 86.1 |