Tsinghua Students Optimize Transformer: 27 Million Parameters Outperform Claude
A 00s-era student from Tsinghua University has once again made waves in the world of Transformer models, developing a system called HRM that boasts only 27 million parameters but outperforms larger models like o3 and Claude. HRM stands out due to its computational efficiency. Unlike traditional Transformer models, which often rely on expansive neural network architectures requiring significant memory and time resources, HRM maintains computational universality even within strict constraints. Essentially, it can simulate any Turing-complete machine model, thereby overcoming the limitations of the standard Transformer's computational locality. This means that, despite the typical depth limitations of early Transformer models, HRM can still learn effectively through deep layers. One key feature of HRM is its ability to operate naturally in continuous space without the need for explicit human-annotated symbolic chains. Instead, it leverages reinforced learning, a widely adopted training method that has recently been shown to primarily exploit existing causal symbolic reasoning capabilities rather than discover entirely new mechanisms. However, this method can introduce instability and lower data efficiency. To address these issues, HRM uses a dense gradient monitoring signal, which provides more stable and efficient feedback compared to the sparse reinforcement signals typically used. Additionally, its capability to autonomously run in continuous space not only enhances biological plausibility but also dynamically allocates computational resources based on the complexity of reasoning tasks, avoiding uniform processing of each token. These innovations highlight HRM's potential in advancing the field of computational and reasoning systems, particularly in practical applications where resource efficiency is crucial. For further reading: - Google Scholar - Guan Wang's LinkedIn - Austin Zhen's LinkedIn - ArXiv Paper Operated and curated by Wu Long.