GPT-5 Released, Sam Altman: It's Like Talking to a PhD Expert, With Key Upgrades for Programming, Writing, and health.

a year ago

"GPT-3 feels like talking to a high school student, GPT-4 feels like talking to a college student, and GPT-5 feels like talking to a doctoral-level expert."At the just-concluded press conference, Sam Altman spoke highly of GPT-5 in his opening remarks - GPT-5 is "the world's most powerful model for programming and writing."

Building a unified system

GPT-5 is a unified system that includes an intelligent and efficient model to answer most questions (GPT-5-main).a deep reasoning model (GPT-5-thinking) for solving more complex problems,A real-time router quickly decides which model to use based on the conversation type, question complexity, required tools, and the user's stated intent. The router is continuously trained using real-world signals, including user behavior switching between models, preference for answers, and response accuracy assessments, leading to continuous optimization.

According to its public official documents, the reasoning models including gpt-5-thinking, gpt-5-thinking-mini and gpt-5-thinking-nano are trained through reinforcement learning to improve their reasoning ability. These models will "think" before answering questions and generate a whole internal chain of thinking before responding to users. Through training,These models learned to optimize their thinking processes, try different strategies, and identify their own mistakes.

According to OpenAI’s evaluation, GPT‑5 (with reasoning mode enabled) performs better than OpenAI o3 in capabilities including visual reasoning, agent coding, and graduate-level scientific problem solving.And the number of output tokens decreased from 50% to 80%.

At the same time, in the Aider polyglot test that evaluates coding ability,GPT‑5 breaks the record with a score of 88%.Its error rate is reduced by two-thirds compared to o3.

GPT-5 also surpasses the current state-of-the-art in multiple areas, including a score of 94.61 TP3T on the AIME 2025 test, 74.91 TP3T on the real-world coding task SWE-bench Verified, and 84.21 TP3T on the MMMU. Powered by the enhanced reasoning capabilities of GPT-5 Pro, the model also achieved a score of 88.41 TP3T on the GPQA (General Purpose Question Answering) task, also reaching the current state-of-the-art.

Focus on improving three major scenarios: writing, programming and health consultation

It is reported that OpenAI's three most common application scenarios in ChatGPT are:Writing, programming, and health.Further improved the performance of GPT-5.

OpenAI proposed,GPT‑5 is its most powerful programming model to date.It has achieved significant improvements in complex front-end generation and debugging of large code bases—with just a single prompt, it can generate beautiful and responsive websites, applications, and games, demonstrating a high level of aesthetic sensitivity. Furthermore, GPT‑5 excels at in-depth analysis of code bases, accurately answering questions about the operational mechanisms of code modules and their interoperability.

In addition to programming, GPT‑5 also performs very well in various agent tasks, setting new records in the benchmarks of instruction following (scoring 69.6% on Scale MultiChallenge) and tool calling (scoring 96.7% on τ(2)-bench telecom).

In the LongFact and FactScore benchmarks,GPT‑5’s factual error rate is about 80% lower than o3.This makes GPT‑5 particularly suitable for agent task scenarios with high correctness requirements, especially in key areas such as code generation, data processing, and decision support.

In terms of creative writing, GPT-5 can create copy with literary depth, rhythm, and resonance. It is more reliable when handling structurally ambiguous writing tasks, such as maintaining the coherence of iambic meter, and can achieve clear and powerful expression while respecting the stylistic form, thereby providing more realistic writing in scenarios such as drafting polishing reports, emails, and memos.

It is worth mentioning thatTo control the default length of GPT‑5 answers, OpenAI has also added a new Verbosity API parameter.This parameter supports three optional values: low , medium , and high . If an explicit instruction conflicts with a redundant parameter, the explicit instruction takes precedence. For example, if a user asks GPT-5 to "write a five-paragraph essay," the model's response should always contain five paragraphs.

On health-related issues,GPT-5 achieved a record high score of 46.2% in the HealthBench benchmark.It can proactively identify potential health issues and provide precise recommendations based on the user's background knowledge and geographic location.

OpenAI has been making a lot of moves recently. It just grabbed a new SOTA position in the open source field with gpt-oss, and now it has released the highly anticipated GPT-5. The simultaneous release of multiple products demonstrates its technological strength. However, how the model performs in terms of performance and security, it is better to "let the bullets fly for a while" and wait for market testing.

References:

1.https://www.theverge.com/openai/748017/gpt-5-chatgpt-openai-release

2.https://cdn.openai.com/pdf/8124a3ce-ab78-4f06-96eb-49ea29ffb52f/gpt5-system-card-aug7.pdf

GPT-5 Released, Sam Altman: It's Like Talking to a PhD Expert, With Key Upgrades for Programming, Writing, and health.

a year ago

Information

Reasoning

Building a unified system

At the same time, in the Aider polyglot test that evaluates coding ability,GPT‑5 breaks the record with a score of 88%.Its error rate is reduced by two-thirds compared to o3.

Focus on improving three major scenarios: writing, programming and health consultation

It is reported that OpenAI's three most common application scenarios in ChatGPT are:Writing, programming, and health.Further improved the performance of GPT-5.

References:

1.https://www.theverge.com/openai/748017/gpt-5-chatgpt-openai-release

2.https://cdn.openai.com/pdf/8124a3ce-ab78-4f06-96eb-49ea29ffb52f/gpt5-system-card-aug7.pdf

Command Palette

GPT-5 Released, Sam Altman: It's Like Talking to a PhD Expert, With Key Upgrades for Programming, Writing, and health.

Building a unified system

Focus on improving three major scenarios: writing, programming and health consultation

Command Palette

GPT-5 Released, Sam Altman: It's Like Talking to a PhD Expert, With Key Upgrades for Programming, Writing, and health.

Building a unified system

Focus on improving three major scenarios: writing, programming and health consultation

Related News

OpenAI Releases GeneBench-Pro, Which Assesses AI Research Capabilities Across 129 Questions and 10 domains.

Free CPU Online Tutorial | Hermes Agent: Learn Long-Term Memory? The Memory Enhancement Plugin TencentDB Agent Memory Can Store Facts, Preferences, Task States, etc., separately.

Paper Weekly Report | Microsoft MAI-Thinking Explores self-evolution of Pure RL, Achieving an AIME Accuracy of 97%; VLM³ Achieves 3D Task Generalization Using Plain Text Coordinates Without Architectural Modifications… A Quick Overview of the week's cutting-edge AI Papers

Dataset Compilation | AI Agent Evaluation Datasets: 10 Datasets Released by Microsoft, Peking University, HKU, Shanghai Jiao Tong University, etc., Covering Everything From long-range Memory to real-world Task execution.

Tutorial Summary | Open-source Small Models Achieve Overall Intelligence Comparable to GPT-5; one-stop Evaluation of Popular Models Such As Qwen 3.5/Gemma 4.

Online Tutorial | In-depth Guide to Instruction Following/Inference/Coding: Mistral Medium 3.5 Brings Coding Agents to the Cloud

Token Usage Decreased by 30%. Eywa, a Heterogeneous Intelligent Agent Framework Inspired by "Avatar," Efficiently Combines Language Models With domain-specific Basic models.

Zero-sampling TTS Breakthrough! A Few Seconds of Reference Audio, OmniVoice Helps You Easily Clone Hundreds of Languages; 17 Languages All in One Go: MDPbench Solves the Major Problem of Parsing low-resource Text systems.

A Locally Runnable Privacy Detection Model: Privacy Filter Achieves high-quality PII Filtering at Low Cost; Hardcore Open Source! Covering the Transfermarkt Structured Football Dataset With Over 80,000 matches.

Command Palette

GPT-5 Released, Sam Altman: It's Like Talking to a PhD Expert, With Key Upgrades for Programming, Writing, and health.

Building a unified system

Focus on improving three major scenarios: writing, programming and health consultation

Related News

OpenAI Releases GeneBench-Pro, Which Assesses AI Research Capabilities Across 129 Questions and 10 domains.

Free CPU Online Tutorial | Hermes Agent: Learn Long-Term Memory? The Memory Enhancement Plugin TencentDB Agent Memory Can Store Facts, Preferences, Task States, etc., separately.

Paper Weekly Report | Microsoft MAI-Thinking Explores self-evolution of Pure RL, Achieving an AIME Accuracy of 97%; VLM³ Achieves 3D Task Generalization Using Plain Text Coordinates Without Architectural Modifications… A Quick Overview of the week's cutting-edge AI Papers

Dataset Compilation | AI Agent Evaluation Datasets: 10 Datasets Released by Microsoft, Peking University, HKU, Shanghai Jiao Tong University, etc., Covering Everything From long-range Memory to real-world Task execution.

Tutorial Summary | Open-source Small Models Achieve Overall Intelligence Comparable to GPT-5; one-stop Evaluation of Popular Models Such As Qwen 3.5/Gemma 4.

Online Tutorial | In-depth Guide to Instruction Following/Inference/Coding: Mistral Medium 3.5 Brings Coding Agents to the Cloud

Token Usage Decreased by 30%. Eywa, a Heterogeneous Intelligent Agent Framework Inspired by "Avatar," Efficiently Combines Language Models With domain-specific Basic models.

Zero-sampling TTS Breakthrough! A Few Seconds of Reference Audio, OmniVoice Helps You Easily Clone Hundreds of Languages; 17 Languages All in One Go: MDPbench Solves the Major Problem of Parsing low-resource Text systems.

A Locally Runnable Privacy Detection Model: Privacy Filter Achieves high-quality PII Filtering at Low Cost; Hardcore Open Source! Covering the Transfermarkt Structured Football Dataset With Over 80,000 matches.

Related News

OpenAI Releases GeneBench-Pro, Which Assesses AI Research Capabilities Across 129 Questions and 10 domains.

Free CPU Online Tutorial | Hermes Agent: Learn Long-Term Memory? The Memory Enhancement Plugin TencentDB Agent Memory Can Store Facts, Preferences, Task States, etc., separately.

Paper Weekly Report | Microsoft MAI-Thinking Explores self-evolution of Pure RL, Achieving an AIME Accuracy of 97%; VLM³ Achieves 3D Task Generalization Using Plain Text Coordinates Without Architectural Modifications… A Quick Overview of the week's cutting-edge AI Papers

Dataset Compilation | AI Agent Evaluation Datasets: 10 Datasets Released by Microsoft, Peking University, HKU, Shanghai Jiao Tong University, etc., Covering Everything From long-range Memory to real-world Task execution.

Tutorial Summary | Open-source Small Models Achieve Overall Intelligence Comparable to GPT-5; one-stop Evaluation of Popular Models Such As Qwen 3.5/Gemma 4.

Online Tutorial | In-depth Guide to Instruction Following/Inference/Coding: Mistral Medium 3.5 Brings Coding Agents to the Cloud

Token Usage Decreased by 30%. Eywa, a Heterogeneous Intelligent Agent Framework Inspired by "Avatar," Efficiently Combines Language Models With domain-specific Basic models.

Zero-sampling TTS Breakthrough! A Few Seconds of Reference Audio, OmniVoice Helps You Easily Clone Hundreds of Languages; 17 Languages All in One Go: MDPbench Solves the Major Problem of Parsing low-resource Text systems.

A Locally Runnable Privacy Detection Model: Privacy Filter Achieves high-quality PII Filtering at Low Cost; Hardcore Open Source! Covering the Transfermarkt Structured Football Dataset With Over 80,000 matches.

Related News

OpenAI Releases GeneBench-Pro, Which Assesses AI Research Capabilities Across 129 Questions and 10 domains.

Free CPU Online Tutorial | Hermes Agent: Learn Long-Term Memory? The Memory Enhancement Plugin TencentDB Agent Memory Can Store Facts, Preferences, Task States, etc., separately.

Paper Weekly Report | Microsoft MAI-Thinking Explores self-evolution of Pure RL, Achieving an AIME Accuracy of 97%; VLM³ Achieves 3D Task Generalization Using Plain Text Coordinates Without Architectural Modifications… A Quick Overview of the week's cutting-edge AI Papers

Dataset Compilation | AI Agent Evaluation Datasets: 10 Datasets Released by Microsoft, Peking University, HKU, Shanghai Jiao Tong University, etc., Covering Everything From long-range Memory to real-world Task execution.

Tutorial Summary | Open-source Small Models Achieve Overall Intelligence Comparable to GPT-5; one-stop Evaluation of Popular Models Such As Qwen 3.5/Gemma 4.

Online Tutorial | In-depth Guide to Instruction Following/Inference/Coding: Mistral Medium 3.5 Brings Coding Agents to the Cloud

Token Usage Decreased by 30%. Eywa, a Heterogeneous Intelligent Agent Framework Inspired by "Avatar," Efficiently Combines Language Models With domain-specific Basic models.

Zero-sampling TTS Breakthrough! A Few Seconds of Reference Audio, OmniVoice Helps You Easily Clone Hundreds of Languages; 17 Languages All in One Go: MDPbench Solves the Major Problem of Parsing low-resource Text systems.

A Locally Runnable Privacy Detection Model: Privacy Filter Achieves high-quality PII Filtering at Low Cost; Hardcore Open Source! Covering the Transfermarkt Structured Football Dataset With Over 80,000 matches.