Question Answering On Openbookqa

Accuracy

평가 결과

이 벤치마크에서 각 모델의 성능 결과

		Paper Title	Repository
GPT-4 + knowledge base	95.9	-	-
MVP-Tuning (ensemble)	95.2	-	-
PaLM 540B (Self Improvement, Self Consistency)	94.4	Large Language Models Can Self-Improve	-
X-Reasoner	94.2	-	-
PaLM 540B (Self Improvement, CoT Prompting)	93	Large Language Models Can Self-Improve	-
PaLM 540B (Self Improvement, Standard-Prompting)	92	Large Language Models Can Self-Improve	-
DeBERTa-xxlarge 1.5B + MVP-Tuning	91.3	-	-
GrapeQA: PEGA+CANP	90	GrapeQA: GRaph Augmentation and Pruning to Enhance Question-Answering	-
PaLM 540B (Self Consistency)	90	Large Language Models Can Self-Improve	-
GenMC 11B	89.8	Clues Before Answers: Generation-Enhanced Multiple-Choice QA
AristoRoBERTa + MVP-Tuning	87.6	-	-
AristoRoBERTa + Graph Soft Counter	87.4	GNN is a Counter? Revisiting GNN for Question Answering	-
UnifiedQA 11B	87.2	UnifiedQA: Crossing Format Boundaries With a Single QA System
LLaMA-3 8B+MoSLoRA	86.8	Mixture-of-Subspaces in Low-Rank Adaptation
PaLM 540B (CoT Prompting)	86.4	Large Language Models Can Self-Improve	-
LLaMA-3 8B + MixLoRA	84.8	MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts
PaLM 540B (Standard-Prompting)	84.4	Large Language Models Can Self-Improve	-
TTTTT 3B	83.2	Fusing Context Into Knowledge Graph for Commonsense Question Answering
LLaMA-2 13B + MixLoRA	83	MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts
QA-GNN	82.8	QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering

0 of 45 row(s) selected.