Apple Releases 2025 AI Foundation Model Report, Pang Ruoming's "Swan Song"
Apple recently released the Apple Intelligence Foundation Language Models Tech Report 2025, marking a significant update since the company first disclosed details of its AI foundation model last year. This report is particularly noteworthy as it is likely the last major contribution from Ruoming Pang, the head of Apple's foundation model team, before he joins Meta for a multi-million-dollar salary. Pang shared the report on his social media and formally handed over leadership to Zhifeng Chen. The document outlines a dual-track model strategy designed to balance performance, efficiency, and privacy. The first track consists of a lightweight, 30-billion-parameter model optimized for deployment on devices like iPhones, iPads, and Macs. This model leverages deep optimizations to make the most of Apple's proprietary silicon, ensuring smooth operation on various devices. The second track involves a scalable server model that runs on Apple’s private cloud infrastructure. This setup is intended to handle more complex user requests, maintaining high levels of privacy protection similar to those on-device models. This "device-cloud synergy" ensures that simple tasks are processed locally, while more demanding tasks are offloaded to the cloud. To boost the efficiency of the device-based model, Apple introduced an innovative architecture called "kv cache sharing." By dividing the model into two segments, one segment (37.5% of the layers) shares the cache generated by the other (62.5% of the layers). This reduces memory usage by 37.5% and speeds up the initial token generation time. For the server-side model, Apple developed a novel Transformer architecture known as "Parallel-Track Mixture-of-Experts" (PT-MoE). This approach breaks down a large model into smaller, parallel processing units called "tracks." Each track independently processes information, synchronizing only at specific points to minimize communication bottlenecks and enhance both training and inference efficiency. Additionally, MoE layers within each track allow the model to scale effectively, handling complex tasks with low latency while maintaining quality. Apple’s commitment to privacy is evident in how it handles training data. The company uses data from three main sources: licensed content from publishers, publicly available web information collected by Apple's web crawler Applebot, and high-quality synthetic data. Apple emphasizes that it never uses private user data or interaction logs for training, adhering to the robots.txt protocol to respect website owners' preferences and protect user privacy. The company processed over 10 billion high-quality image-text pairs and 50 billion synthetic image caption datasets, employing advanced pipelines for filtering and purification. Performance assessments show that Apple’s device-based model outperforms or matches open-source models like Qwen-2.5-3B and Gemma-3-4B in standard tests such as MMLU. The server model, while performing well compared to LLaMA 4 Scout, still lags behind larger models like Qwen-3-235B and GPT-4o. However, human evaluations indicate strong performance across multiple languages and tasks. Finally, Apple unveiled a new "Foundation Models Framework" for developers, enabling direct access to the 30-billion-parameter device-based model. This framework is deeply integrated with the Swift programming language and includes a feature called "guided generation," which allows developers to generate structured Swift data types with minimal coding, simplifying the integration of AI capabilities into applications. Apple emphasizes that the framework is built with responsible AI principles in mind, incorporating multiple safety mechanisms to help developers create intelligent, privacy-aware apps. This report and the accompanying framework underscore Apple's dedication to advancing AI while maintaining user privacy, positioning the company to compete more effectively in the rapidly evolving AI landscape.