OpenAI Releases gpt‑oss‑120b and ‑20b: First Open‑Weight Models in Six Years with Powerful, Safe Reasoning
OpenAI has made a significant shift in its strategy by releasing two new open-source large language models: gpt-oss-120b and gpt-oss-20b, with approximately 117 billion and 21 billion parameters respectively. Announced on August 5, these models are now freely available on Hugging Face under the Apache 2.0 license, allowing unrestricted commercial use, redistribution, and customization. This marks a major departure from OpenAI’s historically closed approach, especially since its last open-weight release was GPT-2 in 2019. Both models use a Mixture-of-Experts (MoE) architecture—gpt-oss-120b has 128 experts with 4 activated per token, while gpt-oss-20b has 32 experts. They support a 128k context window and are natively quantized using MXFP4, making them efficient for deployment. The larger model runs smoothly on a single 80GB GPU, while the smaller one can operate on a laptop with just 16GB of memory, enabling widespread accessibility. OpenAI trained these models on large-scale text data focused on STEM, general knowledge, and programming, with sensitive topics like bioweapons, nuclear materials, and dangerous chemicals filtered out. Despite the lack of public disclosure on training data sources, performance benchmarks show strong results. gpt-oss-120b matches or exceeds the performance of OpenAI’s proprietary o4-mini model on key reasoning tasks like AIME, GPQA, and MMLU. It also outperforms several closed models on tool-use benchmarks such as SWE-bench and Codeforces. On HealthBench, it even surpasses GPT-4o and OpenAI o1. The gpt-oss-20b model, while smaller, performs comparably to o3-mini and is ideal for local development. Reddit users have already shared successful local runs on consumer hardware, highlighting its practicality. Notably, both models feature unsupervised chain-of-thought reasoning, meaning OpenAI did not directly supervise the reasoning process. This design choice supports independent research into model behavior and safety monitoring, aligning with OpenAI’s stated goal of enabling transparency and accountability in AI. Safety was a core focus. OpenAI conducted extensive evaluations, including adversarial fine-tuning to push models toward high-risk capabilities. Even after such testing, gpt-oss-120b did not reach OpenAI’s internal threshold for “high-risk” behavior. The company also launched a red teaming challenge offering up to $500,000 in rewards to identify new security vulnerabilities, reinforcing its commitment to a safer open ecosystem. The release is widely seen as a strategic response to the rise of open models like DeepSeek and Qwen, which have gained traction globally. CEO Sam Altman acknowledged that OpenAI had previously “stood on the wrong side” of the open-source movement and emphasized the importance of building an open, democratic AI stack rooted in American values. The models are supported across major platforms including AWS, Azure, Baseten, Databricks, and local tools like LM Studio and Ollama. They also integrate with OpenAI’s Responses API, enabling complex agent-based workflows. This move signals a pivotal moment in AI development—bridging the gap between cutting-edge performance and open access. By empowering developers, researchers, and companies with powerful, freely available models, OpenAI is not only fostering innovation but also shaping the future of responsible, inclusive AI.