Thinking Machines Lab Tackles AI Inconsistency with Reproducible Model Responses
Thinking Machines Lab, the AI research startup led by Mira Murati, is tackling one of the most persistent challenges in large language models: inconsistency in responses. With $2 billion in seed funding and a team of former OpenAI researchers, the company has drawn significant attention since its launch. In a newly published blog post titled “Defeating Nondeterminism in LLM Inference,” the lab offers its first detailed look at one of its core research initiatives—making AI models produce reproducible, deterministic outputs. Currently, most AI models, including those powering popular chatbots like ChatGPT, generate slightly different responses to the same prompt each time. This variability, known as nondeterminism, is widely accepted in the AI community as an inherent limitation. However, Thinking Machines Lab argues that this is not an unavoidable flaw but a problem rooted in how software interacts with hardware during inference—the process that occurs after a user submits a query. The post, authored by researcher Horace He, identifies the source of randomness in AI responses as the way GPU kernels—small computational programs running on Nvidia chips—are orchestrated during inference. These kernels are responsible for executing the calculations needed to generate responses. He suggests that by tightly controlling how these kernels are scheduled and executed, it’s possible to eliminate much of the unpredictability in model outputs. The implications of this work are significant. For enterprises and researchers, more consistent responses mean greater reliability when using AI for critical tasks such as code generation, data analysis, or scientific discovery. Beyond usability, He highlights that deterministic outputs could also improve reinforcement learning (RL), a key method for training AI models. In RL, models are rewarded for correct answers, but inconsistent outputs create noisy training signals. By reducing variability, the training process becomes more stable and efficient. Thinking Machines Lab has previously told investors it plans to use RL to tailor AI models for specific business applications. The research described in the blog post may underpin that effort, enabling more predictable and controllable model customization. Murati, who served as OpenAI’s chief technology officer, said in July that the lab’s first product—expected to be useful for researchers and startups building custom AI models—will be unveiled in the coming months. It remains unclear whether this product will directly implement the techniques described in the post, but the research suggests a strong focus on reliability and control. The lab has committed to sharing its findings openly through regular blog posts, code releases, and public research updates. This new series, called “Connectionism,” reflects a deliberate effort to foster transparency and collaboration—a contrast to OpenAI’s shift toward more closed development as it scaled. Whether Thinking Machines Lab can maintain this open culture while building commercially viable technology remains to be seen. While the blog post doesn’t reveal the full scope of the lab’s future work, it signals that the company is diving deep into foundational challenges in AI. The real test will be whether it can turn these insights into practical tools that justify its $12 billion valuation and deliver on its promise of more predictable, trustworthy AI.