HyperAIHyperAI

Command Palette

Search for a command to run...

Chinese Researcher Wins International Cognitive Science Award for Bridging AI and Human Psychological Measurement Gaps

At the 47th Annual Conference of the Cognitive Science Society (CogSci2025), a groundbreaking study led by researchers from the National University of Defense Technology, the Institute of Information Engineering at the Chinese Academy of Sciences, and Nanyang Technological University in Singapore has made history by winning the prestigious Diversity & Social Inequality Award—the first time a Chinese team has received this honor. The paper, titled AIPsychoBench: Understanding the Psychometric Differences between LLM and Humans, was praised by reviewers with a perfect score and marks a major leap in the emerging field of machine psychology. The award, presented by the Cognitive Science Society—the world’s leading academic body in cognitive science—recognizes research that advances understanding of psychological and social diversity. The winning study addresses a critical gap in AI research: how large language models (LLMs) differ from humans in psychological measurement. While LLMs are trained on vast human-generated text and exhibit human-like responses, their cognitive mechanisms remain opaque. Existing methods often apply human psychological scales directly to LLMs, but this approach faces two major challenges. First, LLMs are typically aligned to avoid bias and provide neutral responses. When asked about personal preferences—such as whether they prefer meeting new friends or chatting with old ones—they often respond with disclaimers like “I am an AI and cannot attend gatherings,” rendering the data unusable for measuring psychological traits. Second, LLMs exhibit inconsistent responses across languages, despite the fact that human psychology remains stable regardless of language. For example, when asked about refusing others, a model may choose “direct refusal” in English but “polite evasion” in Chinese, reflecting cultural nuances in training data rather than true internal consistency. To overcome these issues, the team developed a rigorous framework. They curated 21 standardized psychological scales across six domains—motivation, personality, relationships, self-assessment, emotional intelligence, and collective identity—totaling 777 Likert-scale questions. To bypass alignment restrictions, they introduced a lightweight role-playing prompt, instructing the model to respond as a “real human survey participant” while expressing genuine feelings. This method boosted valid response rates by up to 41% without introducing significant bias, unlike stronger jailbreak techniques. The team also translated the tests into seven major languages (Chinese, Russian, French, Spanish, Arabic, German, Japanese) to analyze cross-linguistic effects. They used GPT-4o to verify consistency between numerical scores and textual explanations, ensuring data quality. The results revealed two key insights. First, lightweight role-playing enables efficient, low-bias psychological measurement—ideal for future research. Second, language significantly impacts LLM responses, with differences exceeding 20% in religious topics (Arabic vs. English) and 9% in perfectionism (Chinese, Japanese vs. English), indicating that language is a critical confounding variable in LLM psychological evaluation. The study introduces AIPsychoBench, the first standardized, multilingual, and bias-controlled benchmark for LLM psychological measurement. This framework not only advances the scientific rigor of machine psychology but also positions China as a leader in interdisciplinary AI research. The work underscores the need for culturally and linguistically sensitive methods when evaluating AI cognition. Meanwhile, in a parallel development, Dr. Lin Qika from the National University of Singapore and collaborators have developed DeepMedix-R1, a medical foundation model for chest X-ray interpretation. Unlike conventional models that offer black-box diagnoses, DeepMedix-R1 generates structured, traceable reasoning steps linked precisely to image regions. By combining high-quality synthetic reasoning data with online reinforcement learning, the model achieves superior performance in report generation and visual question answering. It enhances both diagnostic accuracy and explainability—key for clinical trust. The team plans to extend the model to CT and MRI, conduct clinical trials, and explore real-world deployment, with ambitions to bridge gaps in healthcare access through AI. These advances highlight a growing trend: AI is moving beyond mere performance toward deeper understanding, transparency, and societal relevance.

Related Links