HyperAI超神经

Researchers from Google DeepMind and University College London have published a groundbreaking study revealing how large language models (LLMs) handle confidence in their answers and respond to external advice. The findings highlight both similarities and differences between the cognitive biases of LLMs and humans, with significant implications for the development of multi-turn AI systems. In the study, the researchers tested how LLMs update their confidence and decide whether to alter their responses when presented with external guidance. They designed a controlled experiment where an "answering LLM" was initially given a binary-choice question, such as identifying the correct latitude for a city from two options. After making its initial choice, the LLM received advice from a fictitious "advice LLM," which included an explicit accuracy rating and could either agree, disagree, or remain neutral. The answering LLM then had to make a final decision. A crucial aspect of the experiment involved controlling whether the LLM's initial answer was visible to it during the final decision-making stage. This allowed researchers to understand how the memory of past decisions influences current confidence levels. A baseline condition, where the initial answer was hidden and the advice was neutral, gauged how much the LLM's answer might change due to random variances. Key findings include the LLM's tendency to stick with its initial choice when that choice was visible, a behavior similar to the choice-supportive bias observed in human decision-making. When the initial answer was hidden, the LLM was more likely to switch choices. This highlights the model's reliance on past information and the cognitive bias it introduces. The study also confirmed that LLMs do integrate external advice effectively. Contrary advice increased the LLM's likelihood of changing its mind, while supportive advice decreased it. However, LLMs were found to be overly sensitive to contradictory information, performing large confidence updates that could lead them to abandon an initially correct answer. This sensitivity is unlike human confirmation bias, where individuals tend to favor information that confirms their existing beliefs. One possible explanation for this behavior is the training technique of reinforcement learning from human feedback (RLHF), which may encourage models to be overly deferential to user input, a phenomenon known as sycophancy. This remains a significant challenge for AI labs and underscores the need for careful tuning of training methods to avoid unwanted biases. For enterprise applications, the study's findings are particularly relevant. LLMs used in extended conversations can be disproportionately influenced by the most recent information, especially if it contradicts their initial answer. This can lead to the LLM discarding a correct response in favor of an incorrect one. To mitigate these issues, developers can implement strategies to manage the AI's context. Periodically summarizing long conversations and presenting key facts and decisions neutrally can help reset the model’s reasoning process. By stripping away which agent made which choice and restarting the dialogue with a clean slate, developers can reduce the risk of cognitive biases affecting the LLM's performance. Understanding these nuances is essential for creating more robust and reliable AI applications. As LLMs become increasingly integrated into business workflows, anticipating and correcting for these biases will be crucial for ensuring that the systems perform accurately and consistently over multiple turns. Industry insiders emphasize the importance of this research, noting that it provides valuable insights into the inner workings of LLMs and helps developers design better, more trustworthy AI systems. Google DeepMind, a leader in AI research, continues to push the boundaries of what is possible with LLMs, contributing to the broader field's understanding of AI limitations and potential improvements. The study also reflects the ongoing efforts of the AI community to address the challenges of building reliable, multi-turn conversational agents.

Google Study Reveals LLMs Can Abandon Correct Answers Under Contrary Advice, Impacting Multi-Turn AI Systems

Related Links