HyperAI超神経

Mathematical Reasoning On Aime24

評価指標

Acc

評価結果

このベンチマークにおける各モデルのパフォーマンス結果

モデル名
Acc
Paper TitleRepository
Qwen2.5-72B-Instruct23.3Qwen2.5 Technical Report
Search-o156.7Search-o1: Agentic Search-Enhanced Large Reasoning Models
Openai-o174.4--
Openai-o1-preview44.6--
Claude3.5-Sonnet16--
Openai-o1-mini70.0--
s1-32B56.7s1: Simple test-time scaling
DeepSeek-r179.8DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
0 of 8 row(s) selected.