Google's FACTS Benchmark Reveals Top AI Model Is Only 69% Accurate, Highlighting Critical Reliability Gaps in AI Fact-Checking
A new benchmark from Google DeepMind reveals that even the most advanced AI models still get their facts wrong nearly one-third of the time. The FACTS Benchmark Suite, introduced this week, evaluates AI systems across four key areas: answering straightforward fact-based questions using internal knowledge, leveraging web search effectively, accurately extracting information from long documents, and interpreting visual content. The top-performing model, Google’s Gemini 3 Pro, achieved a 69% accuracy rate—meaning it provided correct answers only about two out of every three times. Other leading models, including versions from OpenAI and Anthropic, scored significantly lower, highlighting that factual reliability remains a major hurdle across the industry. For context, if a journalist on a newsroom team submitted stories with a 69% accuracy rate, they would likely be let go. This underscores a critical gap between AI’s impressive fluency and its real-world trustworthiness. While models can generate text that sounds confident and coherent, they often fabricate details, misinterpret sources, or fail to ground responses in verified information. The implications extend far beyond journalism. In high-stakes fields like healthcare, finance, and law, even minor inaccuracies can lead to serious consequences. Melia Russell, a colleague at Business Insider, recently explored how law firms are grappling with AI-generated misinformation. One firm reportedly fired an employee after they submitted a legal filing filled with fabricated case law—generated by ChatGPT—highlighting the risks of unchecked AI use. The FACTS benchmark isn’t just a critique—it’s a roadmap. By systematically identifying where models fail, Google aims to guide future improvements in training data, reasoning capabilities, and verification mechanisms. Still, for now, the message is clear: AI is advancing rapidly, but it remains unreliable when it comes to factual accuracy. Organizations relying on AI for critical decisions must proceed with caution, incorporating human oversight and rigorous validation processes to mitigate the risks of error.
