HyperAI

Multi Task Language Understanding On Bbh Nlp

Métriques

Average (%)

Résultats

Résultats de performance de divers modèles sur ce benchmark

Nom du modèle
Average (%)
Paper TitleRepository
Qwen2.5-72B86.3--
PaLM 540B (CoT)71.2Scaling Instruction-Finetuned Language Models
Orca 2-7B45.93Orca 2: Teaching Small Language Models How to Reason-
PaLM 540B62.7Scaling Instruction-Finetuned Language Models
Flan-PaLM 540B (5-shot, finetuned)70.0Scaling Instruction-Finetuned Language Models
Flan-PaLM 540B (3-shot, fine-tuned, CoT + SC)78.4Scaling Instruction-Finetuned Language Models
PaLM 540B (CoT + self-consistency)78.2Scaling Instruction-Finetuned Language Models
Orca 2-13B50.18Orca 2: Teaching Small Language Models How to Reason-
code-davinci-002 175B (CoT)73.5Evaluating Large Language Models Trained on Code
Qwen2-72B82.4--
Jiutian-大模型86.1--
Flan-PaLM 540B (3-shot, fine-tuned, CoT)72.4Scaling Instruction-Finetuned Language Models
LLama-3-405B85.9--
Jiutian-57B84.07--
LLama-3-70B81.0--
0 of 15 row(s) selected.