HyperAI

Multi Task Language Understanding On Bbh Alg

Metrics

Average (%)

Results

Performance results of various models on this benchmark

Comparison Table
Model NameAverage (%)
scaling-instruction-finetuned-language-models61.3
evaluating-large-language-models-trained-on73.9
scaling-instruction-finetuned-language-models57.6
scaling-instruction-finetuned-language-models66.5
scaling-instruction-finetuned-language-models38.3
scaling-instruction-finetuned-language-models48.2
scaling-instruction-finetuned-language-models62.2