HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
服务条款
隐私政策
中文
HyperAI
HyperAI超神经
Toggle Sidebar
全站搜索…
⌘
K
Command Palette
Search for a command to run...
算力平台
首页
SOTA
数学应用题求解
Math Word Problem Solving On Math
Math Word Problem Solving On Math
评估指标
Accuracy
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
Accuracy
Paper Title
Gemini 2.0 Flash Experimental
89.7
-
Qwen2.5-Math-72B-Instruct(TIR,Greedy)
88.1
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
GPT-4 Turbo (MACM, w/code, voting)
87.920
MACM: Utilizing a Multi-Agent System for Condition Mining in Solving Complex Mathematical Problems
Qwen2.5-Math-72B-Instruct(COT,Greedy)
85.9
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
Qwen2.5-Math-7B-Instruct(TIR,Greedy)
85.2
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
GPT-4-code model (CSV, w/ code, SC, k=16)
84.3
Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification
Qwen2-Math-72B-Instruct(greedy)
84.0
Qwen2 Technical Report
Qwen2.5-Math-7B-Instruct(COT,Greedy)
83.6
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
Qwen2.5-Math-1.5B-Instruct(TIR,Greedy)
79.9
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
OpenMath2-Llama3.1-70B (majority@256)
79.6
OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data
OpenMath2-Llama3.1-8B (majority@256)
76.1
OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data
Qwen2.5-Math-1.5B-Instruct(COT,Greedy)
75.8
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
GPT-4-code model (CSV, w/ code)
73.5
Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification
CR (GPT-4-turbo model, w/ code)
72.2
Cumulative Reasoning with Large Language Models
OpenMath2-Llama3.1-70B
71.9
OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data
LogicNet (with code interpreter)
71.2
Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification
Qwen2-72B-Instruct-Step-DPO (0-shot CoT, w/o code)
70.8
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs
GPT-4-code model (w/ code)
69.7
Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification
OpenMath2-Llama3.1-8B
67.8
OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data
AlphaMath-7B-SBS@3
66.3
AlphaMath Almost Zero: Process Supervision without Process
0 of 135 row(s) selected.
Previous
Next
Math Word Problem Solving On Math | SOTA | HyperAI超神经