HyperAIHyperAI

Command Palette

Search for a command to run...

Cerebras and OpenAI Launch Codex-Spark: Ultra-Fast AI Model for Real-Time Coding with Wafer-Scale Engine

Cerebras has announced the launch of OpenAI’s new GPT-5.3-Codex-Spark model, now available in research preview, marking the first major milestone in the collaboration between Cerebras and OpenAI. Built on the Cerebras Wafer-Scale Engine, the model delivers over 1,000 tokens per second, enabling near-instant feedback in real-time coding environments. Codex-Spark is specifically designed for agentic software development, where responsiveness is as critical as intelligence. While autonomous AI agents can now work for hours or even days without human intervention, developers often face long delays and feel disconnected from the process. Codex-Spark addresses this by offering fast, interactive performance that keeps developers in control, allowing them to guide, refine, and steer the AI’s work in real time. Optimized for speed and precision, Codex-Spark is a highly capable small model tailored for fast inference. On benchmarks like SWE-Bench Pro and Terminal-Bench 2.0, it outperforms GPT-5.1-Codex-mini while completing tasks significantly faster. It excels at making targeted code edits, revising development plans, and answering context-specific questions about a codebase—making it ideal for rapid prototyping, UI/UX experimentation, and iterative design. “Cerebras has been a great engineering partner, and we’re excited to add fast inference as a core platform capability,” said Sachin Katti, Head of Compute at OpenAI. “Bringing wafer-scale compute into production gives us a new way to keep Codex responsive for latency-sensitive tasks, and we’re eager to learn from developer feedback on how to seamlessly integrate these compute capabilities into a unified workflow.” The Cerebras Wafer-Scale Engine is purpose-built for AI workloads, featuring the largest on-chip memory of any AI processor. This enables ultra-fast inference at thousands of tokens per second per user. The architecture is designed to scale across thousands of systems, extending high-speed memory capacity into the multi-terabyte range—enabling support for trillion-parameter models in both training and inference. Codex-Spark is currently available as a research preview for ChatGPT Pro users through the Codex app, command-line interface, and VS Code extension. API access will be rolled out to select design partners in the coming weeks. This release is just the beginning—Cerebras plans to bring similar high-speed inference capabilities to the largest frontier models by 2026.

Related Links

Cerebras SystemsCerebras Systems
Cerebras and OpenAI Launch Codex-Spark: Ultra-Fast AI Model for Real-Time Coding with Wafer-Scale Engine | Trending Stories | HyperAI