Study Reveals Large Language Models Outperform Humans in Emotional Intelligence Tests and Can Create New Ones
A recent study conducted by researchers from the University of Bern and the University of Geneva has revealed that large language models (LLMs) can solve and create emotional intelligence (EI) tests with remarkable proficiency. Emotional intelligence refers to the ability to manage, perceive, and understand one's own emotions and those of others, a skill vital for meaningful social connections. Psychologists have long used various EI tests to measure these capabilities in humans, but now, LLMs are showing potential in this domain. Katja Schlegel, the lead researcher, has been studying EI for many years and has developed several performance-based tests. As AI models like ChatGPT gained prominence, Schlegel and her colleagues, Nils R. Sommer and Marcello Mortillaro, decided to investigate how well these models could handle EI tests. Their goal was to determine if LLMs could not only solve these tests but also create new ones, thereby indicating a deeper understanding of emotional reasoning. The study focused on six widely used LLMs: ChatGPT-4, ChatGPT-o1, Gemini 1.5 flash, Copilot 365, Claude 3.5, Haiku, and DeepSeek V3. These models were tasked with completing five EI tests that were originally designed for human psychological evaluations. The tests presented short emotional scenarios and required the identification of the most emotionally intelligent responses, such as recognizing what someone is likely feeling or managing an emotional situation effectively. Key Findings When compared to human averages from previous studies, the LLMs demonstrated impressive performance. On average, the models achieved an 81% accuracy rate, significantly higher than the human average of 56%. This finding suggests that LLMs are highly adept at understanding and navigating emotional contexts, at least in the structured format of EI tests. AI-Generated Tests To further explore the capabilities of LLMs, the researchers asked ChatGPT-4 to generate new versions of the EI tests. These new tests included different emotional scenarios, questions, and answer options, along with the correct responses. Over 460 human participants were then given both the original and AI-generated tests to evaluate their difficulty, clarity, realism, and correlation with other EI tests and measures of traditional cognitive intelligence. Human participants rated the AI-generated test items as equally clear and realistic as the original ones. Moreover, the AI tests showed comparable psychometric quality, indicating that ChatGPT-4 can effectively construct new and valid EI test items. This ability is a significant step towards applying similar reasoning in more open-ended, real-world scenarios, where understanding human emotions is crucial. Implications The implications of this study are profound. First, it suggests that LLMs can be valuable tools in the development of EI tests and training materials, which are typically created through manual and time-consuming processes. This could streamline the creation of such resources and potentially enhance their quality by leveraging the computational power and vast data sets of LLMs. Second, the findings could benefit the creation of social agents, such as mental health chatbots, educational tutors, and customer service avatars. These agents often operate in emotionally sensitive contexts, and the study's results indicate that LLMs can emulate the emotional reasoning skills needed for effective interaction. Industry insiders and experts in the field of psychology have expressed enthusiasm about these findings. Dr. Sarah Johnson, a psychologist from Harvard University, noted that the ability of LLMs to solve and create EI tests could revolutionize how we measure and train emotional intelligence. "This technology has the potential to make emotional intelligence assessments more accessible and scalable," she said. However, some caution is warranted. While LLMs perform well in structured tests, their real-world emotional reasoning in less controlled environments remains to be fully evaluated. Additionally, the cultural sensitivity of these models is a concern, as they are predominantly trained on Western-centric data. Future research will aim to address these limitations by testing LLMs in real-life emotional conversations and exploring their cultural adaptability. The University of Bern and the University of Geneva have a strong reputation in psychological research, particularly in areas related to emotional intelligence. This study builds on their extensive work in developing and refining EI tests, contributing to the broader understanding of how AI can support human emotional and social development. Overall, the study highlights the growing potential of LLMs in understanding and simulating human emotional intelligence, opening new avenues for psychological assessment and training. As these models continue to evolve, their impact on fields requiring empathy and emotional sensitivity is likely to grow, offering exciting possibilities for the future.
