HyperAI

Logical Reasoning On Big Bench Formal

Metriken

Accuracy

Ergebnisse

Leistungsergebnisse verschiedener Modelle zu diesem Benchmark

Modellname
Accuracy
Paper TitleRepository
Gopher-280B (few-shot, k=5)50.7Scaling Language Models: Methods, Analysis & Insights from Training Gopher
PaLM 540B (few-shot, k=3)53.6BloombergGPT: A Large Language Model for Finance-
GPT-NeoX 20B (few-shot, k=3)52.8BloombergGPT: A Large Language Model for Finance-
OPT 66B (few-shot, k=3)54BloombergGPT: A Large Language Model for Finance-
PaLM 2 (few-shot, k=3, Direct)64.8PaLM 2 Technical Report
BLOOM 176B (few-shot, k=3)52.8BloombergGPT: A Large Language Model for Finance-
Chinchilla-70B (few-shot, k=5)52.1Training Compute-Optimal Large Language Models
Bloomberg GPT 50B (few-shot, k=3)50.8BloombergGPT: A Large Language Model for Finance-
PaLM 2 (few-shot, k=3, CoT)57.2PaLM 2 Technical Report
0 of 9 row(s) selected.