HyperAI

Multi Task Language Understanding On Bbh Alg

Métriques

Average (%)

Résultats

Résultats de performance de divers modèles sur ce benchmark

Tableau comparatif
Nom du modèleAverage (%)
scaling-instruction-finetuned-language-models61.3
evaluating-large-language-models-trained-on73.9
scaling-instruction-finetuned-language-models57.6
scaling-instruction-finetuned-language-models66.5
scaling-instruction-finetuned-language-models38.3
scaling-instruction-finetuned-language-models48.2
scaling-instruction-finetuned-language-models62.2