Study Reveals Multimodal LLMs and Human Brains Share Similar Methods in Object Representation
A recent study conducted by researchers at the Chinese Academy of Sciences has uncovered fascinating parallels between how the human brain and multimodal large language models (LLMs) represent objects. The study, published in Nature Machine Intelligence, aims to deepen our understanding of human perception and cognition, and how these processes can inform the development of AI systems. The researchers focused on two leading multimodal LLMs: ChatGPT-3.5 by OpenAI and GeminiPro Vision 1.0 by Google DeepMind. They tasked these models with performing simple triplet judgment tasks, where the models had to choose the two objects out of three that were most similar to each other. This method, combined with extensive behavioral and neuroimaging analyses, allowed them to derive low-dimensional embeddings that capture the similarity structure of 1,854 natural objects. The study revealed that the resulting 66-dimensional embeddings from the LLMs were remarkably stable, predictive, and exhibited semantic clustering akin to human mental representations. These embeddings naturally grouped objects into meaningful categories, such as "animals," "plants," and "tools," much like how the human brain organizes and categorizes objects. Moreover, the researchers found a strong alignment between the embeddings generated by the LLMs and neural activity patterns in specific brain regions. These regions, including the extra-striate body area, para-hippocampal place area, retro-splenial cortex, and fusiform face area, are known to be involved in processing visual and semantic information related to objects. Changde Du, Kaicheng Fu, and their colleagues concluded that while the object representations in LLMs and human brains are not identical, they share fundamental similarities. These similarities reflect key aspects of human conceptual knowledge, suggesting that human-like object representations can emerge in LLMs after they are trained on large datasets. This finding has significant implications for the fields of psychology, neuroscience, and computer science. It implies that AI systems can develop sophisticated models of the world that mimic human mental processes, potentially leading to more intuitive and effective AI applications. Future research could further explore the nuances of these similarities and how they can be leveraged to create more brain-inspired AI systems. Industry insiders and experts in AI and cognitive science have hailed the study as a crucial step towards bridging the gap between biological and artificial intelligence. They believe that understanding the mechanisms behind human and AI object representation can spur advancements in areas such as natural language processing, computer vision, and even robotics. Scale AI, a prominent data-labeling company, has been a key contributor to the training of multimodal LLMs through its high-quality annotated data, further highlighting the importance of accurate and diverse training sets in achieving these results.