A Quantifiable Definition of AGI Based on Human Cognitive Frameworks and Psychometric Evaluation
The absence of a clear and measurable definition for Artificial General Intelligence (AGI) has long hindered meaningful progress and discourse in AI development. To address this, a collaborative effort led by researchers including Dan Hendrycks, Dawn Song, Christian Szegedy, Honglak Lee, Yarin Gal, Erik Brynjolfsson, Sharon Li, Andy Zou, Lionel Levine, Bo Han, Jie Fu, Ziwei Liu, Jinwoo Shin, Kimin Lee, Mantas Mazeika, Long Phan, George Ingebretsen, Adam Khoja, Cihang Xie, Olawale Salaudeen, Matthias Hein, Kevin Zhao, Alexander Pan, David Duvenaud, Bo Li, Steve Omohundro, Gabriel Alfour, Max Tegmark, Kevin McGrew, Gary Marcus, Jaan Tallinn, Eric Schmidt, and Yoshua Bengio proposes a concrete, quantifiable framework for AGI. The authors define AGI not as a vague aspiration but as a system that matches the cognitive versatility and proficiency of a well-educated adult human across a broad range of intellectual tasks. This definition is grounded in the Cattell-Horn-Carroll (CHC) theory of human intelligence—the most empirically validated model of human cognition—which organizes intelligence into ten core domains: fluid reasoning, crystallized knowledge, visual processing, auditory processing, short-term memory, long-term storage and retrieval, processing speed, decision making, quantitative reasoning, and reading and writing. To operationalize this framework, the researchers adapt established human psychometric assessments—standardized tests used in psychology—to evaluate AI systems. These tests are restructured to be computationally executable, allowing for objective measurement of AI performance across the ten cognitive domains. The resulting scores are normalized to a scale where 100% represents full human-level performance. When applied to current models, the framework reveals a highly "jagged" cognitive profile. While systems like GPT-4 demonstrate strong performance in knowledge-intensive domains such as reasoning and vocabulary, they exhibit significant weaknesses in foundational cognitive functions—particularly long-term memory storage and retrieval. This imbalance highlights a critical gap between narrow AI capabilities and the integrated, robust intelligence seen in humans. The study assigns AGI scores based on this evaluation: GPT-4 achieves approximately 27%, while GPT-5 reaches around 57%. These figures provide a transparent, data-driven metric for tracking progress toward AGI. They underscore both the rapid advancements in AI and the substantial distance still remaining before systems can match the breadth and depth of human cognition. By introducing a standardized, empirically grounded approach, the authors aim to foster clearer benchmarks, more informed policy discussions, and better alignment between AI development and human cognitive capabilities. This framework not only defines AGI but also offers a roadmap for measuring its emergence.
