Multi Task Language Understanding On Bbh Alg
Metrics
Average (%)
Results
Performance results of various models on this benchmark
Comparison Table
Model Name | Average (%) |
---|---|
scaling-instruction-finetuned-language-models | 61.3 |
evaluating-large-language-models-trained-on | 73.9 |
scaling-instruction-finetuned-language-models | 57.6 |
scaling-instruction-finetuned-language-models | 66.5 |
scaling-instruction-finetuned-language-models | 38.3 |
scaling-instruction-finetuned-language-models | 48.2 |
scaling-instruction-finetuned-language-models | 62.2 |