HyperAIHyperAI
Back to Headlines

AMD Ryzen AI Max+ 395 Supports Local Execution of 128-Billion-Parameter Large Models

12 days ago

AMD has unveiled a major upgrade to its Ryzen AI Max+ 395, marking a significant leap in on-device AI capabilities. Built on the Zen5 architecture, the processor now enables local execution of AI models with up to 128 billion parameters—making it the first platform capable of running such large-scale models directly on a consumer device. This advancement pushes the boundaries of what’s possible for personal AI workstations. To achieve this performance, the Ryzen AI Max+ 395 still requires a 128GB unified memory configuration, with 96GB specifically allocated as dedicated video memory. The system must run in a Vulkan llama.cpp environment, offering developers greater flexibility and control over model deployment. This setup allows the processor to efficiently handle complex inference tasks while maintaining high throughput. The upgrade unlocks support for the 1090-billion-parameter Meta Llama4Sout model, which weighs in at 66GB and includes advanced features like Vision and MCP (Multi-Component Processing). This model, previously accessible only on high-end cloud infrastructure or specialized hardware, can now run locally on a desktop system powered by the Ryzen AI Max+ 395. The key to this breakthrough lies in the use of a Mixture of Experts (MoE) architecture, which activates only a subset of the model’s parameters during inference—dramatically reducing memory and compute demands without sacrificing performance. In real-world benchmarks, the Ryzen AI Max+ 395 delivers a strong 15 tokens per second, a competitive speed for such a large model running locally. It also supports other high-capacity models, including a 68GB, 1230-billion-parameter Mistral Large model, as well as smaller but powerful models like the 18GB, 300-billion-parameter Qwen3A3B and the 17GB, 270-billion-parameter Google Gemma. One of the most impressive enhancements is in context handling. The processor now supports a maximum context length of 256,000 tokens—far exceeding the typical 32,000-token limit used by most standard models. This enables users to process and analyze extremely long documents, complex codebases, or extensive datasets in a single session, a capability previously reserved for expensive server-grade systems. Pricing has also become more accessible. A compact AI workstation equipped with the Ryzen AI Max+ 395 and 128GB of memory is now available for around 13,000 RMB, significantly lowering the barrier to entry for high-end AI development and deployment. With this combination of performance, scalability, and affordability, AMD’s latest platform is setting a new standard for edge-based AI innovation.

Related Links

AMD Ryzen AI Max+ 395 Supports Local Execution of 128-Billion-Parameter Large Models | Headlines | HyperAI