MiniMax M2.7 advances agentic workflows on NVIDIA
MiniMax has released M2.7, an advanced iteration of its M2 series designed to enhance scalable agentic workflows on NVIDIA platforms. This update targets complex applications in software engineering, machine learning research, and office automation. As an open weights model, M2.7 is accessible via NVIDIA and the broader open-source inference ecosystem. The MiniMax M2 family utilizes a sparse mixture-of-experts (MoE) architecture, which balances high capability with operational efficiency. While the model contains a total of 230 billion parameters, only 10 billion are active for any given token, resulting in an activation rate of just 4.3%. This design leverages 256 local experts, activating only the eight most relevant to a specific input. To ensure stable training at scale, the architecture employs multi-head causal self-attention combined with Rotary Position Embeddings and Query-Key Root Mean Square Normalization. These optimizations allow the model to handle a 200,000-token context window while maintaining low inference costs, making it particularly effective for coding challenges and long-running agentic tasks. To facilitate the deployment of long-running autonomous agents, NVIDIA introduced NemoClaw, an open-source reference stack. This tool simplifies the setup of always-on assistants by providing a secure environment through the NVIDIA OpenShell runtime. Developers can launch this environment with a single command on the NVIDIA Brev cloud AI GPU platform, enabling safe interaction with open models like M2.7. Performance was significantly improved through collaborations with the open-source community to integrate high-performance kernels into vLLM and SGLang. On NVIDIA Blackwell Ultra GPUs, these optimizations yielded substantial throughput gains. Tests using a 1K input and 1K output dataset showed that vLLM achieved up to a 2.5 times improvement in throughput within one month. Similarly, SGLang delivered up to a 2.7 times increase in throughput under the same conditions. These enhancements ensure that the high-capacity MoE model can operate efficiently in real-world production settings. Deployment options are diverse, catering to both experimentation and enterprise scaling. Developers can test the model using free, GPU-accelerated endpoints hosted on NVIDIA GPUs via the build.nvidia.com portal. For production workloads, NVIDIA NIM offers optimized, containerized inference microservices that can be deployed on-premises, in the cloud, or in hybrid environments. Additionally, the NVIDIA NeMo Framework supports post-training needs. Using the NeMo AutoModel library, users can fine-tune M2.7 with specific recipes and data. The framework also supports reinforcement learning through the NeMo RL library, complete with sample configurations for various sequence lengths and validation curves. MiniMax M2.7 represents a significant step forward for efficient, large-scale AI agents. By combining advanced MoE architecture with robust NVIDIA infrastructure and open-source tools, the release provides a comprehensive solution for building, optimizing, and deploying complex AI applications across the entire development lifecycle. Interested parties can access the model and documentation on Hugging Face or the NVIDIA developer portal.
