Question Answering On Triviaqa

Metriken

Ergebnisse

Leistungsergebnisse verschiedener Modelle zu diesem Benchmark

Modellname	EM	Paper Title	Repository
LLaMA 65B (few-shot, k=64)	73.0	LLaMA: Open and Efficient Foundation Language Models	-
GaC(Qwen2-72B-Instruct + Llama-3-70B-Instruct)	79.29	Breaking the Ceiling of the LLM Community by Treating Token Generation as a Classification for Ensembling	-
Mistral 7B (5-shot)	69.9	Mistral 7B	-
EMDR2	71.4	End-to-End Training of Multi-Document Reader and Retriever for Open-Domain Question Answering	-
GLaM 62B/64E (Few-shot)	75.8	GLaM: Efficient Scaling of Language Models with Mixture-of-Experts	-
RankRAG-llama3-8b (Zero-Shot, KILT)	82.9	RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs	-
RA-DIT (Zero-Shot)	75.4	RA-DIT: Retrieval-Augmented Dual Instruction Tuning	-
Search-o1	-	Search-o1: Agentic Search-Enhanced Large Reasoning Models	-
LLaMA 65B (one-shot)	71.6	LLaMA: Open and Efficient Foundation Language Models	-
LinkBERT (large)	-	LinkBERT: Pretraining Language Models with Document Links	-
RankRAG-llama3-70b (Zero-Shot, KILT)	86.5	RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs	-
FiE+PAQ	72.6	FiE: Building a Global Probability Space by Leveraging Early Fusion in Encoder for Open-Domain Question Answering	-
ReasonBERTB	-	ReasonBERT: Pre-trained to Reason with Distant Supervision	-
DyREX	-	DyREx: Dynamic Query Representation for Extractive Question Answering	-
Mnemonic Reader	46.94	Reinforced Mnemonic Reader for Machine Reading Comprehension	-
Claude 2 (few-shot, k=5)	87.5	Model Card and Evaluations for Claude Models	-
BigBird-etc	-	Big Bird: Transformers for Longer Sequences	-
GPT-4-0613 (Zero-shot)	84.8	GPT-4 Technical Report	-
PaLM 2-S (one-shot)	75.2	PaLM 2 Technical Report	-
Claude Instant 1.1 (few-shot, k=5)	78.9	Model Card and Evaluations for Claude Models	-

0 of 56 row(s) selected.