HyperAI
HyperAI초신경
홈
플랫폼
문서
뉴스
연구 논문
튜토리얼
데이터셋
백과사전
SOTA
LLM 모델
GPU 랭킹
컨퍼런스
전체 검색
소개
서비스 약관
개인정보 처리방침
한국어
HyperAI
HyperAI초신경
Toggle Sidebar
전체 사이트 검색...
⌘
K
Command Palette
Search for a command to run...
플랫폼
홈
SOTA
수학 문제 해결
Math Word Problem Solving On Math
Math Word Problem Solving On Math
평가 지표
Accuracy
평가 결과
이 벤치마크에서 각 모델의 성능 결과
Columns
모델 이름
Accuracy
Paper Title
Gemini 2.0 Flash Experimental
89.7
-
Qwen2.5-Math-72B-Instruct(TIR,Greedy)
88.1
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
GPT-4 Turbo (MACM, w/code, voting)
87.920
MACM: Utilizing a Multi-Agent System for Condition Mining in Solving Complex Mathematical Problems
Qwen2.5-Math-72B-Instruct(COT,Greedy)
85.9
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
Qwen2.5-Math-7B-Instruct(TIR,Greedy)
85.2
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
GPT-4-code model (CSV, w/ code, SC, k=16)
84.3
Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification
Qwen2-Math-72B-Instruct(greedy)
84.0
Qwen2 Technical Report
Qwen2.5-Math-7B-Instruct(COT,Greedy)
83.6
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
Qwen2.5-Math-1.5B-Instruct(TIR,Greedy)
79.9
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
OpenMath2-Llama3.1-70B (majority@256)
79.6
OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data
OpenMath2-Llama3.1-8B (majority@256)
76.1
OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data
Qwen2.5-Math-1.5B-Instruct(COT,Greedy)
75.8
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
GPT-4-code model (CSV, w/ code)
73.5
Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification
CR (GPT-4-turbo model, w/ code)
72.2
Cumulative Reasoning with Large Language Models
OpenMath2-Llama3.1-70B
71.9
OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data
LogicNet (with code interpreter)
71.2
Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification
Qwen2-72B-Instruct-Step-DPO (0-shot CoT, w/o code)
70.8
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs
GPT-4-code model (w/ code)
69.7
Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification
OpenMath2-Llama3.1-8B
67.8
OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data
AlphaMath-7B-SBS@3
66.3
AlphaMath Almost Zero: Process Supervision without Process
0 of 135 row(s) selected.
Previous
Next