HyperAI

OpenAI has launched GPT-5.2, calling it its most powerful model series yet, designed to unlock greater economic value for professionals. Fidji Simo, CEO of Applications, described the release as the result of months of work, positioning it as a direct challenge to Google’s Gemini 3. The GPT-5.2 series includes Instant, Thinking, and Pro models, each optimized for everyday professional tasks such as creating spreadsheets, building presentations, writing code, analyzing images, handling long contexts, using tools, and managing complex, multi-step projects. In internal testing, GPT-5.2 Thinking outperformed previous models across key benchmarks. On the GDPval test—measuring performance on 44 knowledge-intensive professional tasks—it achieved a 70.9% success rate, surpassing human experts in many cases. It also delivered a 11x speed advantage over experts while costing less than 1% of their time. In software engineering, GPT-5.2 Thinking scored 55.6% on SWE-Bench Pro, a rigorous real-world coding benchmark, and 80% on SWE-bench Verified. It also excelled in math, science, and abstract reasoning, including a 99.4% score on the HMMT 2025 math competition and 40.3% on FrontierMath Tier 1–3. The model shows significant improvements in reliability and reduced hallucinations—errors dropped by 38% compared to GPT-5.1. This makes it more trustworthy for professionals relying on accurate outputs. In long-context tasks, GPT-5.2 Thinking achieved near-perfect accuracy (up to 98.7%) on OpenAI MRCRv2, even at 256k tokens, enabling deeper analysis of lengthy documents like contracts and research papers. Visually, GPT-5.2 is the most advanced model yet, cutting error rates in image understanding by about half. It better interprets spatial relationships in technical diagrams, UIs, and control panels—critical for engineering, design, and customer support. In tool use, it scored 98.7% on Tau2-bench Telecom, showing strong ability to manage multi-step workflows, such as handling complex customer service cases involving flight changes, medical seating, and compensation. OpenAI also highlighted GPT-5.2’s role in scientific research. In a real-world study, the model helped explore an open problem in statistical learning theory, with findings reviewed and validated by human experts. It also achieved 92.4% on GPQA Diamond, a top-tier science benchmark, and broke the 90% threshold on ARC-AGI-1, a test of general reasoning. The models are rolling out first to paid users (Plus, Pro, Go, Business, Enterprise) in ChatGPT, with gradual deployment to ensure stability. GPT-5.1 will remain available for three months before being sunset. On the API, GPT-5.2 is already live under names like gpt-5.2-chat-latest and gpt-5.2-pro, with support for a new high-intensity inference level (xhigh) for premium tasks. Pricing is set at $1.75 per million input tokens and $14 for output, with 90% cache discounts. While more expensive than GPT-5.1, GPT-5.2’s higher efficiency means lower overall cost for high-quality work. The models are supported by NVIDIA’s H100/H200/GB200-NVL72 GPUs and Microsoft Azure infrastructure. OpenAI also announced a three-year licensing deal with Disney, allowing user-generated social videos featuring characters from Marvel, Pixar, and Star Wars—some of which will stream on Disney+. Disney will also become a major customer and investor, with a $1 billion equity stake. Additionally, OpenAI is testing an age-prediction model to automatically apply safety filters for minors before launching ChatGPT’s “adult mode,” expected in Q1 2026. The company is also addressing known issues like over-rejection, with a “code red” internal push to prioritize model quality over features like advertising. GPT-5.2 marks a major leap in AI capability, combining speed, accuracy, and reliability for real-world professional use—while setting a new standard in the ongoing race for AI dominance.

Related Links

Related Links

Related Links

A New Method for Predicting Battery Life, Proposed by the University of Michigan and Others, Has Shortened the Verification Cycle by 40 Times, Saving 98% Evaluation Time Through "discovery learning."

A New Method for Predicting Battery Life, Proposed by the University of Michigan and Others, Has Shortened the Verification Cycle by 40 Times, Saving 98% Evaluation Time Through "discovery learning."

Command Palette

OpenAI Unveils GPT-5.2 in Push for Agentic AI Leadership

Related Links

Command Palette

OpenAI Unveils GPT-5.2 in Push for Agentic AI Leadership

Related Links

Command Palette

OpenAI Unveils GPT-5.2 in Push for Agentic AI Leadership

Related Links

A New Method for Predicting Battery Life, Proposed by the University of Michigan and Others, Has Shortened the Verification Cycle by 40 Times, Saving 98% Evaluation Time Through "discovery learning."

A New Method for Predicting Battery Life, Proposed by the University of Michigan and Others, Has Shortened the Verification Cycle by 40 Times, Saving 98% Evaluation Time Through "discovery learning."