HRM: Small Model Outperforms Giants in Reasoning
Most new AI model releases follow the same script: trillions of parameters, massive GPU clusters, astronomical energy costs, and a relentless push toward scale. GPT-5 fits right in—another colossal system, routing decisions between shallow and deep thinking, delivering top-tier performance in coding and math, and reinforcing the idea that bigger is better. That’s the prevailing narrative: more compute, more parameters, more cost, and—supposedly—more intelligence. Then comes something unexpected: the Hierarchical Reasoning Model, or HRM. It has just 27 million parameters—tiny by today’s standards. Its training data is so small, you could process it over a weekend. Guan Wang, one of the model’s creators, claims it can be trained to solve expert-level Sudoku puzzles in just two GPU hours. Two hours. I’ve spent longer than that wrestling with CUDA drivers on a malfunctioning A100 (a tale I won’t soon forget). And yet, HRM doesn’t just run—it dominates. On ARC-AGI, one of the most challenging benchmarks for general reasoning in AI, HRM achieved a score of 40.3%. That’s far ahead of Claude 3.7 (21.2%) and OpenAI’s o3-mini-high (34.5%). The real shock comes on harder tasks. On Sudoku-Extreme—puzzles that stump most models—HRM solved 55% of them. Claude and o3-mini? Zero. The same pattern repeats on large 30×30 mazes: HRM found the optimal path 74.5% of the time, while its rivals achieved zero success. This isn’t just a performance win—it’s a paradigm shift. HRM proves that intelligence isn’t solely a function of scale. Instead, it suggests that smart architecture, efficient reasoning, and targeted training can outperform massive models that rely on brute force. In a world obsessed with size, HRM is a quiet revolution—a tiny brain that humbles giants.