HyperAI

Arithmetic Reasoning On Gsm8K

Métriques

Accuracy
Parameters (Billion)

Résultats

Résultats de performance de divers modèles sur ce benchmark

Nom du modèle
Accuracy
Parameters (Billion)
Paper TitleRepository
code-davinci-002 175B (LEVER, 8-shot)84.5175LEVER: Learning to Verify Language-to-Code Generation with Execution
GPT-2-Medium 355M + question-solution classifier (BS=1)16.80.355Composing Ensembles of Pre-trained Models via Iterative Consensus-
ToRA-Code 7B72.67ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving
OpenMath-CodeLlama-7B (w/ code)75.97OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset
Gemini Pro (maj1@32)86.5-Gemini: A Family of Highly Capable Multimodal Models
U-PaLM58.5540Transcending Scaling Laws with 0.1% Extra Compute-
MathCoder-CL-13B74.17MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning
OpenMath2-Llama3.1-70B94.9-OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data
ChatGPT (Ask, Refine, Trust)82.6-The ART of LLM Refinement: Ask, Refine, and Trust-
Camelidae-8×34B (5-shot)78.3-Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks
Orca-Math 7B (fine-tuned)86.87Orca-Math: Unlocking the potential of SLMs in Grade School Math-
PaLM 540B (Self Consistency)74.4540Large Language Models Can Self-Improve-
LLaMA 13B17.813LLaMA: Open and Efficient Foundation Language Models
MMOS-CODE-7B(0-shot)73.97An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning
OVM-Mistral-7B (verify100@1)84.77OVM, Outcome-supervised Value Models for Planning in Mathematical Reasoning
Minerva 62B (maj5@100)8962Solving Quantitative Reasoning Problems with Language Models
Jiutian-大模型95.275--
DIVERSE 175B (8-shot)83.2175Making Large Language Models Better Reasoners with Step-Aware Verifier-
LLaMA 33B-maj1@k53.133LLaMA: Open and Efficient Foundation Language Models
DART-Math-Llama3-8B-Uniform (0-shot CoT, w/o code)82.58DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
0 of 160 row(s) selected.