HyperAIHyperAI

Command Palette

Search for a command to run...

4 days ago
LLM
Transformer

Xiaomi launches global MiMo API token plan

Xiaomi's MiMo API Open Platform has announced a global launch of its updated pricing strategy and concluded its token incentive program on May 27, 2026. The core of this announcement involves a permanent price reduction for the MiMo-V2.5 series, which saw up to 99% lower costs compared to previous rates. The new pricing model removes differentials based on input length and applies globally, taking effect at midnight Beijing time on May 27. Concurrently, the platform successfully completed its Quadrillion Token Creator Incentive Program. Launched on April 28, the initiative aimed to distribute 100 trillion tokens to developers. This target was fully achieved ahead of schedule by 4:08 PM Beijing time on May 26. While the general event has concluded, the Apache Software Foundation continues to enjoy exclusive, long-term welfare benefits unaffected by the program's end. In a significant update for existing users, all Token Plan quota credits have been fully reset as of the pricing change time. This reset applies to current subscribers within their validity periods, including those who received credits through the incentive program or hold exclusive Apache Foundation benefits. The reset ensures that all users will transition smoothly to the new billing rules. Furthermore, Xiaomi has prepared surprise gifts for historical paid users whose plans have expired, with details to be released in the coming week. These pricing adjustments are underpinned by substantial technical optimizations in the inference system. The Xiaomi technical team implemented Sliding Window Attention (SWA) based on SGLang HiCache technology. This innovation reduced the data transfer volume for KV Cache across GPU, CPU, and SSD storage levels to approximately one-seventh of the previous amount. Consequently, the number of cacheable tokens increased nearly fivefold, significantly boosting cache hit rates and overall inference efficiency. Additional enhancements to expert parallelism schemes and input length bucketing strategies have further increased cluster input throughput. These changes allow Xiaomi to lower service costs per token without compromising service quality. Xiaomi states that the value of technology lies in its widespread adoption. By leveraging continuous innovation to provide high-performance models at low costs, the company aims to meet large-scale inference demands and promote the development of a complete AI infrastructure chain. The overarching mission remains to enable more people to access better AI models through the MiMo platform.

Related Links