HyperAI

Question Answering On Triviaqa

Metriken

EM

Ergebnisse

Leistungsergebnisse verschiedener Modelle zu diesem Benchmark

Modellname
EM
Paper TitleRepository
LLaMA 65B (few-shot, k=64)73.0LLaMA: Open and Efficient Foundation Language Models
GaC(Qwen2-72B-Instruct + Llama-3-70B-Instruct)79.29Breaking the Ceiling of the LLM Community by Treating Token Generation as a Classification for Ensembling-
Mistral 7B (5-shot)69.9Mistral 7B
EMDR271.4End-to-End Training of Multi-Document Reader and Retriever for Open-Domain Question Answering
GLaM 62B/64E (Few-shot)75.8GLaM: Efficient Scaling of Language Models with Mixture-of-Experts-
RankRAG-llama3-8b (Zero-Shot, KILT)82.9RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs-
RA-DIT (Zero-Shot)75.4RA-DIT: Retrieval-Augmented Dual Instruction Tuning-
Search-o1-Search-o1: Agentic Search-Enhanced Large Reasoning Models
LLaMA 65B (one-shot)71.6LLaMA: Open and Efficient Foundation Language Models
LinkBERT (large)-LinkBERT: Pretraining Language Models with Document Links
RankRAG-llama3-70b (Zero-Shot, KILT)86.5RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs-
FiE+PAQ72.6FiE: Building a Global Probability Space by Leveraging Early Fusion in Encoder for Open-Domain Question Answering-
ReasonBERTB-ReasonBERT: Pre-trained to Reason with Distant Supervision
DyREX-DyREx: Dynamic Query Representation for Extractive Question Answering
Mnemonic Reader46.94Reinforced Mnemonic Reader for Machine Reading Comprehension
Claude 2 (few-shot, k=5)87.5Model Card and Evaluations for Claude Models-
BigBird-etc-Big Bird: Transformers for Longer Sequences
GPT-4-0613 (Zero-shot)84.8GPT-4 Technical Report
PaLM 2-S (one-shot)75.2PaLM 2 Technical Report
Claude Instant 1.1 (few-shot, k=5)78.9Model Card and Evaluations for Claude Models-
0 of 56 row(s) selected.