Multi Task Language Understanding On Bbh Alg

评估指标

Average (%)

评测结果

各个模型在此基准测试上的表现结果

模型名称
Average (%)
Paper TitleRepository
Flan-PaLM 540B (3-shot, fine-tuned, CoT)61.3Scaling Instruction-Finetuned Language Models-
code-davinci-002 175B (CoT)73.9Evaluating Large Language Models Trained on Code-
PaLM 540B (CoT)57.6Scaling Instruction-Finetuned Language Models-
Flan-PaLM 540B (3-shot, fine-tuned, CoT + SC)66.5Scaling Instruction-Finetuned Language Models-
PaLM 540B38.3Scaling Instruction-Finetuned Language Models-
Flan-PaLM 540B (3-shot, fine-tuned)48.2Scaling Instruction-Finetuned Language Models-
PaLM 540B (CoT + self-consistency)62.2Scaling Instruction-Finetuned Language Models-
0 of 7 row(s) selected.
Multi Task Language Understanding On Bbh Alg | SOTA | HyperAI超神经