HyperAI
HyperAI
Startseite
Plattform
Dokumentation
Neuigkeiten
Forschungsarbeiten
Tutorials
Datensätze
Wiki
SOTA
LLM-Modelle
GPU-Rangliste
Veranstaltungen
Suche
Über
Nutzungsbedingungen
Datenschutzrichtlinie
Deutsch
HyperAI
HyperAI
Toggle Sidebar
Seite durchsuchen…
⌘
K
Command Palette
Search for a command to run...
Plattform
Startseite
SOTA
Arithmetische Vernunft
Arithmetic Reasoning On Gsm8K
Arithmetic Reasoning On Gsm8K
Metriken
Accuracy
Parameters (Billion)
Ergebnisse
Leistungsergebnisse verschiedener Modelle zu diesem Benchmark
Columns
Modellname
Accuracy
Parameters (Billion)
Paper Title
Claude 3.5 Sonnet (HPT)
97.72
-
Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models
Qwen2-Math-72B-Instruct (greedy)
96.7
72
Qwen2 Technical Report
SFT-Mistral-7B (Metamath, OVM, Smart Ensemble)
96.4
7
-
OpenMath2-Llama3.1-70B (majority@256)
96.0
-
OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data
Jiutian-大模型
95.2
75
-
DAMOMath-7B(MetaMath, OVM, BS, Ensemble)
95.1
7
-
Claude 3 Opus (0-shot chain-of-thought)
95
-
The Claude 3 Model Family: Opus, Sonnet, Haiku
OpenMath2-Llama3.1-70B
94.9
-
OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data
GPT-4 (Teaching-Inspired)
94.8
-
Teaching-Inspired Integrated Prompting Framework: A Novel Approach for Enhancing Reasoning in Large Language Models
SFT-Mistral-7B (Metamath + ovm +ensemble)
94.13
7
-
OpenMath2-Llama3.1-8B (majority@256)
94.1
-
OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data
Qwen2-72B-Instruct-Step-DPO (0-shot CoT)
94.0
-
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs
DAMOMath-7B(MetaMath, OVM, Ensemble)
93.2
7
-
Claude 3 Sonnet (0-shot chain-of-thought)
92.3
-
The Claude 3 Model Family: Opus, Sonnet, Haiku
AlphaLLM (with MCTS)
92
70
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
OpenMath2-Llama3.1-8B
91.7
-
OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data
PaLM 2 (few-shot, k=8, SC)
91.0
-
PaLM 2 Technical Report
GaC(Qwen2-72B-Instruct + Llama-3-70B-Instruct)
90.91
-
Breaking the Ceiling of the LLM Community by Treating Token Generation as a Classification for Ensembling
OpenMath-CodeLlama-70B (w/ code, SC, k=50)
90.8
70
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset
DART-Math-Llama3-70B-Uniform (0-shot CoT, w/o code)
90.4
70
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
0 of 160 row(s) selected.
Previous
Next
Arithmetic Reasoning On Gsm8K | SOTA | HyperAI