OpenAI Researchers Pinpoint Cause of AI Hallucinations: Overemphasis on Guessing in Training
OpenAI researchers have identified the core reason behind one of the most persistent problems in large language models: hallucinations. These occur when AI systems generate confident but factually incorrect information, presenting it as truth. The issue affects even the most advanced models, including OpenAI’s GPT-5 and Anthropic’s Claude. In a paper released on Thursday, OpenAI researchers revealed that hallucinations stem from the way these models are trained and evaluated. Rather than being rewarded for admitting uncertainty, LLMs are incentivized to produce answers—any answer—even when they’re unsure. This creates a fundamental misalignment: models are essentially trained to "fake it till they make it." The researchers explained that large language models are almost always operating in “test-taking mode,” where responses are judged as either right or wrong. This binary framework fails to reflect real-world complexity, where uncertainty is common and absolute accuracy is often unattainable. In contrast, humans learn to express doubt through experience and real-world consequences—what the researchers call the “school of hard knocks.” But language models are rarely exposed to such feedback; instead, they’re judged on exams that penalize hesitation and reward confident, even incorrect, answers. As a result, models learn that guessing improves their performance scores. The paper notes that this dynamic is especially pronounced in evaluation systems that prioritize accuracy over honesty. OpenAI highlighted that models like Claude have shown greater awareness of uncertainty, often refusing to answer when unsure. However, this cautious behavior can reduce their usefulness in practical applications, creating a trade-off between reliability and usability. The solution, according to the researchers, lies in rethinking how models are evaluated. “The root problem is the abundance of evaluations that are not aligned,” they wrote. “The numerous primary evaluations must be adjusted to stop penalizing abstentions when uncertain.” OpenAI emphasized that current scoring systems need to change. “The widely used, accuracy-based evals need to be updated so that their scoring discourages guessing,” the company stated in a follow-up blog post. If models continue to be rewarded for lucky guesses, they will keep learning to guess—regardless of the consequences. The findings mark a significant step toward building more trustworthy AI. By redesigning evaluation metrics to value honesty and uncertainty over blind confidence, OpenAI suggests a path toward models that are not only more accurate but also more reliable in real-world use.
