HyperAI

A new study reveals that popular AI tools like Perplexity, You.com, and Microsoft’s Bing Chat often produce unreliable, overconfident, and one-sided responses. Researchers from Salesforce AI Research, led by Pranav Narayanan Venkit, developed a comprehensive audit framework called DeepTRACE to evaluate the performance of several public AI systems. The study analyzed more than 300 questions across two categories: debate questions on contentious topics, such as whether alternative energy can replace fossil fuels, and expertise questions testing knowledge in specialized fields like computational hydrology. The results showed that about one-third of the claims made by the AI tools were not supported by the sources they cited. For OpenAI’s GPT-4.5, this figure rose to 47%. The study also found that AI systems frequently displayed overconfidence in their answers, even when the information was incorrect or incomplete. On debate topics, the AI tended to present only one side of an argument, reinforcing existing beliefs and creating an echo chamber effect that limits exposure to diverse perspectives. Citation accuracy was another major concern. In some cases, the sources provided by the AI were only accurate 40% to 80% of the time, with many references failing to support the claims they were meant to back. Human reviewers verified the findings to ensure accuracy, confirming that the AI’s reasoning and source alignment were often flawed. The DeepTRACE framework evaluates AI systems across eight key metrics, including overconfidence, bias, source reliability, and argument balance. The researchers emphasize that while AI tools can save time by quickly retrieving information, they should not be trusted without scrutiny. The study underscores the need for better safeguards in AI systems, especially those used for research, education, and decision-making. The findings, published on the arXiv preprint server, highlight the ongoing challenges in ensuring the safety and effectiveness of AI-powered search tools. The researchers stress that without improvements, these systems risk undermining user autonomy and spreading misinformation. As AI continues to evolve, the study calls for stronger sociotechnical frameworks to monitor and guide its development, ensuring that these tools serve users responsibly and transparently.

Related Links

Related Links

Related Links

Online Tutorial | Up to 4x Faster Generation Speed: DiffusionGemma Can Generate Entire Blocks of Text Simultaneously, With Continuous Optimization Based on multi-round Parallel denoising.

Online Tutorial | Up to 4x Faster Generation Speed: DiffusionGemma Can Generate Entire Blocks of Text Simultaneously, With Continuous Optimization Based on multi-round Parallel denoising.

Command Palette

AI Tools Often Unreliable and Overconfident, Study Reveals

Related Links

Command Palette

AI Tools Often Unreliable and Overconfident, Study Reveals

Related Links

Command Palette

AI Tools Often Unreliable and Overconfident, Study Reveals

Related Links

Online Tutorial | Up to 4x Faster Generation Speed: DiffusionGemma Can Generate Entire Blocks of Text Simultaneously, With Continuous Optimization Based on multi-round Parallel denoising.

Online Tutorial | Up to 4x Faster Generation Speed: DiffusionGemma Can Generate Entire Blocks of Text Simultaneously, With Continuous Optimization Based on multi-round Parallel denoising.