HyperAI超神经

Question Answering On Natural Questions

评估指标

EM

评测结果

各个模型在此基准测试上的表现结果

比较表格
模型名称EM
rankrag-unifying-context-ranking-with50.0
palm-2-technical-report-125.3
模型 326.07
search-o1-agentic-search-enhanced-large34
few-shot-learning-with-retrieval-augmented42.4
llama-2-open-foundation-and-fine-tuned-chat33.0
mistral-7b28.8
chatqa-building-gpt-4-level-conversational-qa47.0
llama-open-and-efficient-foundation-language-135.0
glam-efficient-scaling-of-language-models26.3
few-shot-learning-with-retrieval-augmented64.0
replug-retrieval-augmented-black-box-language44.7
rankrag-unifying-context-ranking-with46.1
retrieval-as-attention-end-to-end-learning-of54.7
llama-open-and-efficient-foundation-language-139.9
rankrag-unifying-context-ranking-with54.2
dense-passage-retrieval-for-open-domain41.5
leveraging-passage-retrieval-with-generative54.7
scaling-language-models-methods-analysis-128.2
few-shot-learning-with-retrieval-augmented60.4
llama-open-and-efficient-foundation-language-124.9
palm-scaling-language-modeling-with-pathways-121.2
ask-me-anything-a-simple-strategy-for 19.6
chatqa-building-gpt-4-level-conversational-qa42.7
ask-me-anything-a-simple-strategy-for13.7
leveraging-passage-retrieval-with-generative51.4
palm-2-technical-report-132.0
language-models-are-few-shot-learners29.9
palm-scaling-language-modeling-with-pathways-129.3
fie-building-a-global-probability-space-by58.4
llama-open-and-efficient-foundation-language-131.0
replug-retrieval-augmented-black-box-language45.5
realm-retrieval-augmented-language-model-pre40.4
r2-d2-a-modular-baseline-for-open-domain55.9
ask-me-anything-a-simple-strategy-for19.7
understand-what-llm-needs-dual-preference59.19
retrieval-augmented-generation-for-knowledge44.5
palm-scaling-language-modeling-with-pathways-139.6
glam-efficient-scaling-of-language-models24.7
glam-efficient-scaling-of-language-models32.5
end-to-end-training-of-multi-document-reader52.5
rankrag-unifying-context-ranking-with50.6
training-compute-optimal-large-language35.5
improving-language-models-by-retrieving-from45.5
blended-rag-improving-rag-retriever-augmented42.63
few-shot-learning-with-retrieval-augmented45.1
palm-2-technical-report-137.5