Math Word Problem Solving On Math
Métriques
Accuracy
Résultats
Résultats de performance de divers modèles sur ce benchmark
Tableau comparatif
Nom du modèle | Accuracy |
---|---|
mixtral-of-experts | 28.4 |
palm-2-technical-report-1 | 34.3 |
qwen2-5-math-technical-report-toward | 83.6 |
qwen2-5-math-technical-report-toward | 85.2 |
tora-a-tool-integrated-reasoning-agent-for | 56.9 |
solving-quantitative-reasoning-problems-with | 43.4 |
Modèle 7 | 64.5 |
solving-challenging-math-word-problems-using | 84.3 |
Modèle 9 | 89.7 |
an-empirical-study-of-data-ability-boundary | 55.0 |
cumulative-reasoning-with-large-language | 72.2 |
query-and-response-augmentation-cannot-help | 25.8 |
openchat-advancing-open-source-language | 28.6 |
llama-open-and-efficient-foundation-language-1 | 3.9 |
progressive-hint-prompting-improves-reasoning | 53.9 |
measuring-mathematical-problem-solving-with | 3.0 |
wizardmath-empowering-mathematical-reasoning | 33.0 |
math-shepherd-a-label-free-step-by-step | 43.5 |
query-and-response-augmentation-cannot-help | 35.6 |
dart-math-difficulty-aware-rejection-tuning-1 | 45.5 |
tora-a-tool-integrated-reasoning-agent-for | 60.0 |
tora-a-tool-integrated-reasoning-agent-for | 50.8 |
key-point-driven-data-synthesis-with-its | 48.8 |
solving-quantitative-reasoning-problems-with | 5.6 |
dart-math-difficulty-aware-rejection-tuning-1 | 45.3 |
key-point-driven-data-synthesis-with-its | 41 |
gemini-a-family-of-highly-capable-multimodal-1 | 32.6 |
openmathinstruct-2-accelerating-ai-for-math | 67.8 |
tora-a-tool-integrated-reasoning-agent-for | 49.7 |
dart-math-difficulty-aware-rejection-tuning-1 | 43.5 |
llama-open-and-efficient-foundation-language-1 | 8.8 |
solving-quantitative-reasoning-problems-with | 14.1 |
an-empirical-study-of-data-ability-boundary | 44.3 |
qwen2-5-math-technical-report-toward | 88.1 |
metamath-bootstrap-your-own-mathematical | 19.4 |
llama-open-and-efficient-foundation-language-1 | 2.9 |
solving-quantitative-reasoning-problems-with | 4.4 |
solving-quantitative-reasoning-problems-with | 50.3 |
augmenting-math-word-problems-via-iterative | 45.0 |
galactica-a-large-language-model-for-science-1 | 16.6 |
solving-quantitative-reasoning-problems-with | 47.6 |
qwen2-technical-report | 84.0 |
wizardmath-empowering-mathematical-reasoning | 14.0 |
palm-2-technical-report-1 | 48.8 |
openmathinstruct-2-accelerating-ai-for-math | 76.1 |
openmathinstruct-1-a-1-8-million-math | 43.6 |
measuring-mathematical-problem-solving-with | 6.9 |
gemini-a-family-of-highly-capable-multimodal-1 | 53.2 |
solving-challenging-math-word-problems-using | 71.2 |
measuring-mathematical-problem-solving-with | 5.4 |
deepseekmath-pushing-the-limits-of | 51.7 |
openmathinstruct-1-a-1-8-million-math | 48.3 |
measuring-mathematical-problem-solving-with | 6.4 |
dart-math-difficulty-aware-rejection-tuning-1 | 54.9 |
dart-math-difficulty-aware-rejection-tuning-1 | 56.1 |
solving-challenging-math-word-problems-using | 60.8 |
llama-open-and-efficient-foundation-language-1 | 6.9 |
an-empirical-study-of-data-ability-boundary | 49.5 |
wizardmath-empowering-mathematical-reasoning | 10.7 |
dart-math-difficulty-aware-rejection-tuning-1 | 53.6 |
galactica-a-large-language-model-for-science-1 | 20.4 |
solving-quantitative-reasoning-problems-with | 25.4 |
metamath-bootstrap-your-own-mathematical | 22.5 |
qwen2-5-math-technical-report-toward | 75.8 |
parameter-efficient-sparsity-crafting-from | 29.9 |
qwen2-5-math-technical-report-toward | 85.9 |
parameter-efficient-sparsity-crafting-from | 22.6 |
llama-open-and-efficient-foundation-language-1 | 10.6 |
metamath-bootstrap-your-own-mathematical | 26.0 |
Modèle 70 | 41.8 |
openmathinstruct-1-a-1-8-million-math | 60.4 |
mistral-7b | 13.1 |
galactica-a-large-language-model-for-science-1 | 33.6 |
cumulative-reasoning-with-large-language | 58.0 |
galactica-a-large-language-model-for-science-1 | 11.4 |
solving-quantitative-reasoning-problems-with | 64.9 |
openchat-advancing-open-source-language | 28.9 |
measuring-mathematical-problem-solving-with | 6.2 |
openmathinstruct-1-a-1-8-million-math | 57.6 |
branch-train-mix-mixing-expert-llms-into-a | 17.8 |
query-and-response-augmentation-cannot-help | 30.7 |
measuring-mathematical-problem-solving-with | 5.2 |
mathcoder-seamless-code-integration-in-llms | 30.2 |
dart-math-difficulty-aware-rejection-tuning-1 | 46.6 |
mathcoder-seamless-code-integration-in-llms | 45.2 |
openmathinstruct-1-a-1-8-million-math | 58.3 |
measuring-mathematical-problem-solving-with | 5.6 |
galactica-a-large-language-model-for-science-1 | 8.8 |
alphamath-almost-zero-process-supervision | 66.3 |
macm-utilizing-a-multi-agent-system-for | 87.920 |
llama-open-and-efficient-foundation-language-1 | 7.1 |
tora-a-tool-integrated-reasoning-agent-for | 44.6 |
solving-quantitative-reasoning-problems-with | 27.6 |
math-shepherd-a-label-free-step-by-step | 48.1 |
tora-a-tool-integrated-reasoning-agent-for | 40.1 |
wizardmath-empowering-mathematical-reasoning | 22.7 |
deepseekmath-pushing-the-limits-of | 58.8 |
mixtral-of-experts | 12.7 |
openmathinstruct-1-a-1-8-million-math | 45.5 |
openmathinstruct-1-a-1-8-million-math | 57.2 |
tora-a-tool-integrated-reasoning-agent-for | 43.0 |
tora-a-tool-integrated-reasoning-agent-for | 48.1 |
openmathinstruct-1-a-1-8-million-math | 44.5 |
solving-quantitative-reasoning-problems-with | 33.6 |
mathcoder-seamless-code-integration-in-llms | 35.9 |
solving-challenging-math-word-problems-using | 69.7 |
pal-program-aided-language-models | 51.8 |
measuring-mathematical-problem-solving-with | 2.9 |
key-point-driven-data-synthesis-with-its | 48.6 |
math-shepherd-a-label-free-step-by-step | 33.0 |
mathcoder-seamless-code-integration-in-llms | 45.1 |
step-dpo-step-wise-preference-optimization | 70.8 |
mathcoder-seamless-code-integration-in-llms | 23.3 |
openmathinstruct-1-a-1-8-million-math | 55.6 |
solving-challenging-math-word-problems-using | 73.5 |
llama-open-and-efficient-foundation-language-1 | 20.5 |
skills-in-context-prompting-unlocking | 56.4 |
llama-open-and-efficient-foundation-language-1 | 15.2 |
openmathinstruct-1-a-1-8-million-math | 50.7 |
an-empirical-study-of-data-ability-boundary | 63.7 |
solving-quantitative-reasoning-problems-with | 8.8 |
galactica-a-large-language-model-for-science-1 | 5.2 |
openmathinstruct-1-a-1-8-million-math | 60.2 |
key-point-driven-data-synthesis-with-its | 46.8 |
sparks-of-artificial-general-intelligence | 42.5 |
mathcoder-seamless-code-integration-in-llms | 29.9 |
openmathinstruct-2-accelerating-ai-for-math | 79.6 |
toward-self-improvement-of-llms-via | 51 |
solving-quantitative-reasoning-problems-with | 19.1 |
qwen2-5-math-technical-report-toward | 79.9 |
solving-quantitative-reasoning-problems-with | 1.5 |
openmathinstruct-1-a-1-8-million-math | 46.3 |
openmathinstruct-2-accelerating-ai-for-math | 71.9 |
galactica-a-large-language-model-for-science-1 | 12.7 |
dart-math-difficulty-aware-rejection-tuning-1 | 52.9 |