HyperAI
HyperAI
Home
Console
Docs
News
Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
Terms of Service
Privacy Policy
English
HyperAI
HyperAI
Toggle Sidebar
Search the site…
⌘
K
Command Palette
Search for a command to run...
Console
Home
SOTA
Math Word Problem Solving
Math Word Problem Solving On Math
Math Word Problem Solving On Math
Metrics
Accuracy
Results
Performance results of various models on this benchmark
Columns
Model Name
Accuracy
Paper Title
Gemini 2.0 Flash Experimental
89.7
-
Qwen2.5-Math-72B-Instruct(TIR,Greedy)
88.1
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
GPT-4 Turbo (MACM, w/code, voting)
87.920
MACM: Utilizing a Multi-Agent System for Condition Mining in Solving Complex Mathematical Problems
Qwen2.5-Math-72B-Instruct(COT,Greedy)
85.9
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
Qwen2.5-Math-7B-Instruct(TIR,Greedy)
85.2
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
GPT-4-code model (CSV, w/ code, SC, k=16)
84.3
Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification
Qwen2-Math-72B-Instruct(greedy)
84.0
Qwen2 Technical Report
Qwen2.5-Math-7B-Instruct(COT,Greedy)
83.6
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
Qwen2.5-Math-1.5B-Instruct(TIR,Greedy)
79.9
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
OpenMath2-Llama3.1-70B (majority@256)
79.6
OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data
OpenMath2-Llama3.1-8B (majority@256)
76.1
OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data
Qwen2.5-Math-1.5B-Instruct(COT,Greedy)
75.8
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
GPT-4-code model (CSV, w/ code)
73.5
Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification
CR (GPT-4-turbo model, w/ code)
72.2
Cumulative Reasoning with Large Language Models
OpenMath2-Llama3.1-70B
71.9
OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data
LogicNet (with code interpreter)
71.2
Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification
Qwen2-72B-Instruct-Step-DPO (0-shot CoT, w/o code)
70.8
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs
GPT-4-code model (w/ code)
69.7
Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification
OpenMath2-Llama3.1-8B
67.8
OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data
AlphaMath-7B-SBS@3
66.3
AlphaMath Almost Zero: Process Supervision without Process
0 of 135 row(s) selected.
Previous
Next