Slot Filling On Kilt T Rex

Accuracy

KILT-AC

KILT-F1

R-Prec

Recall@5

평가 결과

이 벤치마크에서 각 모델의 성능 결과

							Paper Title
Re2G	87.68	89.93	75.84	77.05	80.7	89.0	Re2G: Retrieve, Rerank, Generate
KGI_1	84.36	87.24	69.14	70.58	74.36	83.14	-
single ngram	83.72	86.53	60.08	61.72	67.8	81.52	-
Wikipedia	81.34	84.46	64.64	66.64	75.64	87.57	-
MetaRAG	78.66	81.71	61.88	63.09	66.36	76.24	-
KGI_0 (reupload)	77.9	81.31	55.54	56.79	59.7	70.38	-
RAG	59.2	62.96	23.12	23.94	28.68	33.04	-
BART + DPR	59.16	62.76	11.12	11.41	13.26	17.04	-
Sphere	57.02	61.46	0.0	0.0	0.0	0.0	-
10k	53.9	61.74	27.84	32.34	37.62	40.07	-
DensePhrases	53.9	61.74	27.84	32.34	37.62	40.07	Learning Dense Representations of Phrases at Scale
Coop. DistilBert	49.04	54.62	36.68	39.57	48.08	51.86	-
BART	45.06	49.24	0.0	0.0	0.0	0.0	-
T5-base	43.56	50.61	0.0	0.0	0.0	0.0	KILT: a Benchmark for Knowledge Intensive Language Tasks
multi-task small	19.3	25.81	0.0	0.0	0.0	0.0	-
GENRE	0.1	7.67	0.04	6.66	79.42	85.33	-
JivBest	0.02	2.04	0.0	0.0	0.0	0.0	-
Multi-task DPR	0.0	0.0	0.0	0.0	69.46	83.88	-
TABi	0.0	0.0	0.0	0.0	81.9	89.36	-
chriskuei	0.0	0.0	0.0	0.0	79.98	85.75	-

0 of 20 row(s) selected.