MiniMax-M2.5 Launches as Cost-Effective Frontier Model with $1/Hour Pricing and Top-Tier Agentic Performance
On February 12, 2026, just weeks after its Hong Kong IPO, Shanghai-based AI company MiniMax unveiled M2.5, a new frontier model that has drawn attention for its exceptional performance and ultra-low cost. The model achieves an 80.2% score on SWE-Bench Verified and 51.3% on Multi-SWE-Bench—placing it first in the latter and within a percentage point of top-tier models like Claude Opus 4.6 and GPT-5.2. It also scores 76.3% on BrowseComp, showcasing strong web interaction capabilities. What stands out most is the pricing: M2.5 runs at roughly $1 per hour of continuous operation at 100 tokens per second. This makes it significantly cheaper than competitors. For context, Claude Opus 4.6 charges $5 per million input tokens and $25 per million output tokens, while even the newly released GLM-5 is priced at $1/M input and $3.20/M output. M2.5 is a 230-billion-parameter Mixture-of-Experts (MoE) model with only 10 billion active parameters per inference pass—making it highly efficient. It comes in two API versions: Lightning, which offers twice the throughput of other frontier models, and Standard, which is extremely cost-effective. The model is trained using large-scale reinforcement learning across over 200,000 real-world environments, including internal company workflows. MiniMax’s in-house framework, Forge, plays a central role. Forge is an agent-native RL system that decouples the training engine from agent scaffolds, allowing the model to generalize across different tool interfaces rather than overfitting to one. Key innovations in Forge include CISPO (Clipped Importance Sampling Policy Optimization), which improves training efficiency by clipping importance weights instead of token updates, enabling all tokens to contribute to gradients. This led to a 2x speedup over DAPO in tests. Asynchronous scheduling and tree-structured sample merging boosted training throughput by around 40x. Additionally, process-level rewards and real-world task completion time were used to improve credit assignment across long agent trajectories. MiniMax claims the entire M2.5 training cycle took just two months. The M1 reasoning model was trained on 512 H800s in three weeks at a cost of $534,700—suggesting that M2.5’s architecture and training methods are both efficient and scalable. M2.5 shows emergent planning behavior, proactively decomposing tasks before coding, which improves token efficiency—using 3.52M tokens per SWE-Bench task compared to 3.72M for M2.1. It also excels in office productivity tasks, with MiniMax reporting a 59.0% win rate on internal GDPval-MM benchmarks against mainstream models. The company also highlights its consumer-facing agentic platform, MiniMax Agent, where users have built over 10,000 specialized agent configurations. While early feedback from OpenHands suggests M2.5 is occasionally sloppy—such as incorrect branch pushes or formatting issues—it remains a powerful and cost-efficient alternative. MiniMax claims that running four M2.5 instances continuously for a year would cost only $10,000. Looking ahead, MiniMax plans to release a detailed technical blog on Forge and its RL scaling laws. Key questions remain: does performance scale linearly with more environments, and is MiniMax genuinely advancing the frontier in agentic RL, or catching up to others? The focus on coding, office productivity, and cost efficiency may offer a sustainable path to differentiation in a crowded market.
