NextSilicon Unveils Maverick-2 Dataflow Engine and Arbel RISC-V CPU for High-Performance Computing
NextSilicon has unveiled its Maverick-2 dataflow engine, marking a major milestone after eight years and $303 million in funding. The company is introducing a novel computing architecture designed specifically for high-performance computing (HPC), featuring a 64-bit dataflow processor paired with a custom RISC-V CPU named Arbel. This combination aims to deliver unprecedented efficiency and performance for complex scientific simulations, AI workloads, and large-scale data processing. At the heart of the Maverick-2 is a reconfigurable dataflow engine that fundamentally departs from the traditional Von Neumann architecture. Instead of relying on a central instruction fetch unit and sequential execution, Maverick-2 maps computational tasks directly onto hundreds of arithmetic logic units (ALUs) arranged in a grid-like structure. With 224 compute blocks across a single die—each containing hundreds of ALUs—NextSilicon estimates tens of thousands of ALUs per chip, all operating at 1.5 GHz. The chip is built on TSMC’s 5nm process and contains 54 billion transistors. Unlike CPUs, where only about 2% of silicon is dedicated to actual computation, Maverick-2 shifts the balance dramatically. By eliminating much of the overhead associated with instruction scheduling, branch prediction, and speculative execution, the architecture focuses nearly all resources on computation. This results in significantly higher utilization rates—potentially up to 75–80%—compared to traditional processors. The key innovation lies in the software stack. NextSilicon’s compiler automatically takes existing C, C++, or Fortran code, analyzes its intermediate representation, and maps it onto the dataflow engine in real time. It dynamically reconfigures the compute blocks—what the company calls “mill cores”—to optimize performance without human intervention. These mill cores can be added, removed, or replicated in nanoseconds, enabling adaptive, self-optimizing execution. For workloads that don’t benefit from dataflow acceleration, the chip includes 32 RISC-V E-cores on the die, which handle serial tasks and control logic. These cores are part of the Arbel processor, a fully in-house RISC-V design that NextSilicon describes as a “test chip” but one intended to serve as a future host CPU. Arbel features a 10-wide issue decoder, six integer ALUs, four 128-bit FPUs, 64 KB L1 instruction and data caches, and a 1 MB L2 cache per core. NextSilicon claims it can match the performance of Intel’s LionCove and AMD’s Zen5 cores. The Maverick-2 system is being deployed in production by Sandia National Laboratory, which helped develop the earlier Maverick-1 prototype. Performance benchmarks show strong results: on the GUPS test, Maverick-2 achieved 32.6 GUPS at 460 watts—22 times faster than a CPU and nearly six times faster than a GPU. On STREAM, it delivered 5.2 TB/sec, reaching 83.9% of peak bandwidth and outperforming GPUs by 1.86X in performance per watt. In HPCG, a benchmark for real-world HPC problems, a dual-chip OAM module delivered 600 gigaflops at 600 watts—matching top GPUs while using half the power. On PageRank, it was 10 times faster than leading GPUs. The single-die Maverick-2 has a TDP of 400 watts, while the dual-chip OAM version reaches 750 watts. Although peak floating-point performance lags behind top-tier GPUs like the H100, NextSilicon emphasizes sustained performance and efficiency over theoretical peaks. The company is positioning Maverick-2 as a “superchip” solution—combining a host CPU, RISC-V cores, and a massive dataflow engine—offering the programmability of CPUs, the throughput of GPUs, and the efficiency of custom accelerators. While scaling beyond a single socket remains a challenge, the architecture’s flexibility and automated optimization make it a compelling alternative for HPC centers and AI developers seeking to break through the limitations of traditional computing models.
