HyperAI

Google's recent decision to hide the raw reasoning tokens of its Gemini 2.5 Pro model has ignited a strong response from developers. This move, which mirrors a similar action taken by OpenAI, replaces the detailed step-by-step reasoning with a simplified summary, thus reducing transparency. The change highlights a fundamental tension between offering a polished user experience and ensuring the tools needed for enterprise reliability and observability are available. Advanced AI models, like those used by Google’s Gemini, generate an "internal monologue" known as the "Chain of Thought" (CoT). This monologue includes intermediate steps such as data processing, evaluations, and self-corrections. For developers, this trail of reasoning is vital for diagnosing and debugging errors. It allows them to pinpoint where the model's logic deviates, especially when dealing with complex applications. In the Google AI developer forum, users expressed significant dissatisfaction. They argued that without the raw CoT, they cannot accurately identify issues, leading to tedious and repetitive attempts to fix problems. One forum user stated, “I can’t accurately diagnose any issues if I can’t see the raw chain of thought like we used to.” Another lamented, “This is incredibly frustrating, leaving us to guess why the model failed.” Beyond debugging, the CoT is crucial for fine-tuning prompts and system instructions, which are essential for steering the model's behavior. Developers rely on these traces to ensure the AI performs optimally in agentic workflows, where it must execute a series of tasks autonomously. One developer highlighted, “The CoTs helped enormously in tuning agentic workflows correctly.” For enterprises, the shift towards opacity presents substantial challenges. Black-box AI models that conceal their reasoning processes introduce risks, particularly in high-stakes applications. This trend, initiated by OpenAI’s o-series reasoning models and now followed by Google, opens up opportunities for open-source alternatives like DeepSeek-R1 and QwQ-32B. These models offer complete transparency, allowing enterprises to integrate them with greater confidence and control. Google responded to the uproar by explaining their rationale. Logan Kilpatrick, a senior product manager at Google DeepMind, stated that the change was cosmetic and aimed to enhance the user experience for the consumer-facing Gemini app. He noted that a tiny percentage of users read the raw thoughts, and the simplified summaries were introduced to clean up the interface. Kilpatrick added that the team is exploring options to bring back raw thought access in a developer-focused AI Studio, recognizing the need for observability as AI models become more autonomous and execute more complex plans. However, some experts question the value of intermediate reasoning tokens. Subbarao Kambhampati, an AI professor at Arizona State University, argues that these tokens do not necessarily provide a reliable insight into the model’s problem-solving methods. His research suggests that models trained on false reasoning traces and correct results can still perform well, and the latest generation of models uses reinforcement learning, which only verifies the final output, not the reasoning process. Kambhampati explains that while raw tokens might appear coherent, they often aren't useful for most users. “Most users can’t make out anything from the volumes of raw intermediate tokens that these models spew out,” he said. “A smaller, simpler explanation can be more comprehensible and potentially more effective for end users, even if it doesn’t accurately represent the internal operations of the model.” He also posits that hiding the CoT serves as a competitive strategy. These raw reasoning traces are valuable training data that competitors could use to train smaller, more cost-effective models through a process called distillation. By concealing this information, Google and other major players can protect their proprietary algorithms and maintain their competitive edge. The debate over raw reasoning tokens reflects a broader conversation within the AI community. There is still much to discover about the internal workings of reasoning models and how they can be leveraged effectively. Model providers will continue to grapple with the balance between user experience and developer accessibility, especially as AI systems become more sophisticated and integrated into critical business processes. Industry insiders view Google's decision as a mixed bag. While it may improve the user experience for consumers, it raises significant concerns about the reliability and trustworthiness of AI systems in enterprise settings. Companies like DeepMind are under pressure to find a middle ground that satisfies both developers and everyday users, potentially through a "developer mode" that re-enables raw thought access. This decision also underscores the growing importance of transparency and observability in the rapidly evolving field of AI, areas where open-source alternatives could gain a foothold.

Google Faces Backlash from Developers Over Gemini 2.5 Pro Transparency Reduction

Related Links