HyperAI超神经

Microsoft officially unveiled its latest innovation, the small-parameter model Mu, early this morning. This model boasts just 330 million parameters, making it one-tenth the size of Microsoft's previously released Phi-3.5-mini, yet it matches the latter’s performance. An even more impressive feat is Mu's capability to achieve over 100 tokens per second on offline NPU-equipped notebook devices, a significant breakthrough in the small-parameter model domain. One of the standout features of Mu is its integration into Windows, enabling users to set up intelligent agents that can interpret natural language commands and perform system operations. For example, a simple command like “make the mouse pointer larger and adjust the screen brightness” will prompt the agent to accurately execute these settings, enhancing the usability of the Windows operating system. Mu's architecture is optimized for small-scale local deployment, particularly on Copilot+ PCs equipped with NPUs. It builds upon Microsoft's earlier Phi Silica model and is a decoder-only Transformer. The model introduces three major innovations: pre-warming stable decay scheduling, the Muon optimizer, and advanced training techniques. These enhancements were achieved using A100 GPUs. The initial training involved pre-training on hundreds of billions of high-quality educational tokens to learn grammar, semantics, and world knowledge. Subsequently, knowledge distillation from the Phi model further boosted parameter efficiency, allowing Mu to achieve similar performance with only a fraction of the parameters. To enhance the usability of Windows, Microsoft has been working on developing an AI agent capable of understanding natural language and seamlessly modifying system settings. The plan is to integrate Mu-driven agents into the existing search bar to ensure a smooth user experience, which requires ultra-low latency responses for a wide range of settings. After testing various models, Microsoft selected Mu due to its optimal characteristics. While the baseline Mu model experienced a 50% drop in accuracy without fine-tuning, Microsoft mitigated this by increasing the training dataset to 3.6 million samples (a 1300-fold increase) and expanding the number of settings it could handle from around 50 to several hundred. By employing techniques such as automated synthetic labeling, metadata-enhanced prompt tuning, diverse phrasing, noise injection, and smart sampling, the fine-tuned Mu model met the quality targets. Testing revealed that the Mu-powered agent excelled in understanding and executing Windows settings, with response times consistently under 500 milliseconds. This integration represents a significant step forward in making Windows more intuitive and user-friendly, leveraging the capabilities of advanced AI models while maintaining efficient resource usage.

Microsoft Unveils Mu Model to Empower Windows AI Capabilities

Related Links