HyperAIHyperAI

GPT-5 Released, Sam Altman: It's Like Talking to a PhD Expert, With Key Upgrades for Programming, Writing, and health.

特色图像

"GPT-3 feels like talking to a high school student, GPT-4 feels like talking to a college student, and GPT-5 feels like talking to a doctoral-level expert."At the just-concluded press conference, Sam Altman spoke highly of GPT-5 in his opening remarks - GPT-5 is "the world's most powerful model for programming and writing."

Building a unified system

GPT-5 is a unified system that includes an intelligent and efficient model to answer most questions (GPT-5-main).a deep reasoning model (GPT-5-thinking) for solving more complex problems,A real-time router quickly decides which model to use based on the conversation type, question complexity, required tools, and the user's stated intent. The router is continuously trained using real-world signals, including user behavior switching between models, preference for answers, and response accuracy assessments, leading to continuous optimization.

According to its public official documents, the reasoning models including gpt-5-thinking, gpt-5-thinking-mini and gpt-5-thinking-nano are trained through reinforcement learning to improve their reasoning ability. These models will "think" before answering questions and generate a whole internal chain of thinking before responding to users. Through training,These models learned to optimize their thinking processes, try different strategies, and identify their own mistakes.

According to OpenAI’s evaluation, GPT‑5 (with reasoning mode enabled) performs better than OpenAI o3 in capabilities including visual reasoning, agent coding, and graduate-level scientific problem solving.And the number of output tokens decreased from 50% to 80%.

At the same time, in the Aider polyglot test that evaluates coding ability,GPT‑5 breaks the record with a score of 88%.Its error rate is reduced by two-thirds compared to o3.

GPT-5 also surpasses the current state-of-the-art in multiple areas, including a score of 94.61 TP3T on the AIME 2025 test, 74.91 TP3T on the real-world coding task SWE-bench Verified, and 84.21 TP3T on the MMMU. Powered by the enhanced reasoning capabilities of GPT-5 Pro, the model also achieved a score of 88.41 TP3T on the GPQA (General Purpose Question Answering) task, also reaching the current state-of-the-art.

Focus on improving three major scenarios: writing, programming and health consultation

It is reported that OpenAI's three most common application scenarios in ChatGPT are:Writing, programming, and health.Further improved the performance of GPT-5.

OpenAI proposed,GPT‑5 is its most powerful programming model to date.It has achieved significant improvements in complex front-end generation and debugging of large code bases—with just a single prompt, it can generate beautiful and responsive websites, applications, and games, demonstrating a high level of aesthetic sensitivity. Furthermore, GPT‑5 excels at in-depth analysis of code bases, accurately answering questions about the operational mechanisms of code modules and their interoperability.

In addition to programming, GPT‑5 also performs very well in various agent tasks, setting new records in the benchmarks of instruction following (scoring 69.6% on Scale MultiChallenge) and tool calling (scoring 96.7% on τ(2)-bench telecom).

In the LongFact and FactScore benchmarks,GPT‑5’s factual error rate is about 80% lower than o3.This makes GPT‑5 particularly suitable for agent task scenarios with high correctness requirements, especially in key areas such as code generation, data processing, and decision support.

In terms of creative writing, GPT-5 can create copy with literary depth, rhythm, and resonance. It is more reliable when handling structurally ambiguous writing tasks, such as maintaining the coherence of iambic meter, and can achieve clear and powerful expression while respecting the stylistic form, thereby providing more realistic writing in scenarios such as drafting polishing reports, emails, and memos.

It is worth mentioning thatTo control the default length of GPT‑5 answers, OpenAI has also added a new Verbosity API parameter.This parameter supports three optional values: low , medium , and high . If an explicit instruction conflicts with a redundant parameter, the explicit instruction takes precedence. For example, if a user asks GPT-5 to "write a five-paragraph essay," the model's response should always contain five paragraphs.

On health-related issues,GPT-5 achieved a record high score of 46.2% in the HealthBench benchmark.It can proactively identify potential health issues and provide precise recommendations based on the user's background knowledge and geographic location.

OpenAI has been making a lot of moves recently. It just grabbed a new SOTA position in the open source field with gpt-oss, and now it has released the highly anticipated GPT-5. The simultaneous release of multiple products demonstrates its technological strength. However, how the model performs in terms of performance and security, it is better to "let the bullets fly for a while" and wait for market testing.

References:

1.https://www.theverge.com/openai/748017/gpt-5-chatgpt-openai-release

2.https://cdn.openai.com/pdf/8124a3ce-ab78-4f06-96eb-49ea29ffb52f/gpt5-system-card-aug7.pdf

GPT-5 Released, Sam Altman: It's Like Talking to a PhD Expert, With Key Upgrades for Programming, Writing, and health. | News | HyperAI