HyperAI

Multi Task Language Understanding On Bbh Alg

Metriken

Average (%)

Ergebnisse

Leistungsergebnisse verschiedener Modelle zu diesem Benchmark

Vergleichstabelle
ModellnameAverage (%)
scaling-instruction-finetuned-language-models61.3
evaluating-large-language-models-trained-on73.9
scaling-instruction-finetuned-language-models57.6
scaling-instruction-finetuned-language-models66.5
scaling-instruction-finetuned-language-models38.3
scaling-instruction-finetuned-language-models48.2
scaling-instruction-finetuned-language-models62.2