HyperAI

Question Answering On Drop

Metriken

Accuracy

Ergebnisse

Leistungsergebnisse verschiedener Modelle zu diesem Benchmark

Vergleichstabelle
ModellnameAccuracy
large-language-models-can-self-improve78.2
large-language-models-can-self-improve83
large-language-models-can-self-improve71.7
large-language-models-can-self-improve60
large-language-models-can-self-improve70.6
large-language-models-can-self-improve76.2