HyperAIHyperAI

Command Palette

Search for a command to run...

Red Hat Expands AI Collaboration with AWS, Enabling Enhanced Inference on Trainium and Inferentia Chips for Enterprise Gen AI Workloads

Red Hat has announced an expanded collaboration with Amazon Web Services (AWS) to enhance enterprise-grade generative AI deployment on AWS infrastructure using AWS Trainium and Inferentia AI chips. This partnership aims to give organizations greater flexibility, efficiency, and choice when running AI inference workloads at scale. As generative AI adoption grows, businesses face increasing demands for scalable and cost-effective inference solutions. According to IDC, by 2027, 40% of organizations are expected to use custom silicon—such as ARM processors or AI-specific chips—to optimize performance, reduce costs, and accelerate innovation. Red Hat’s collaboration with AWS responds directly to this trend by integrating its AI platform with AWS’s purpose-built AI accelerators. Key components of the collaboration include: Red Hat AI Inference Server on AWS AI Chips: Leveraging the vLLM framework, Red Hat AI Inference Server will support AWS Inferentia2 and Trainium3 chips, enabling high-performance, low-latency inference across any generative AI model. This integration is expected to deliver up to 30–40% better price-performance compared to current GPU-based Amazon EC2 instances. Enabling AI on Red Hat OpenShift: Red Hat and AWS developed an AWS Neuron operator for Red Hat OpenShift, OpenShift AI, and OpenShift Service on AWS. This provides a seamless, supported environment for running AI workloads with AWS accelerators, simplifying deployment and management. Simplified Access and Deployment: Customers will gain easier access to high-capacity AWS AI accelerators through Red Hat’s platform. Additionally, Red Hat has released the amazon.ai Certified Ansible Collection for Red Hat Ansible Automation Platform, enabling automated orchestration of AI services on AWS. Open Source Contributions: Red Hat and AWS are jointly optimizing an AWS AI chip plugin for upstream integration into vLLM—the leading open-source inference framework. As the top commercial contributor to vLLM, Red Hat is advancing scalable inference capabilities, now available as a commercially supported feature in Red Hat OpenShift AI 3. The AWS Neuron community operator is now available in the Red Hat OpenShift OperatorHub for users of OpenShift and OpenShift Service on AWS. Support for Red Hat AI Inference Server on AWS AI chips will enter developer preview in January 2026. Joe Fernandes, VP and GM of AI Business Unit at Red Hat, said the collaboration empowers enterprises to deploy and scale AI with greater efficiency, driven by open source innovation and hybrid cloud flexibility. Colin Brace, VP of Annapurna Labs at AWS, highlighted that the partnership offers customers a supported, high-performance path to production AI, combining open source agility with AWS’s purpose-built hardware. CAE’s CIO Jean-François Gamache noted that the integration with Red Hat OpenShift Service on AWS accelerates digital transformation, enabling faster innovation and AI-driven improvements across critical applications. Techaisle analyst Anurag Agrawal emphasized that this collaboration supports Red Hat’s “any model, any hardware” strategy, helping enterprises shift from costly AI experimentation to sustainable, governed production. The collaboration builds on Red Hat’s long-standing partnership with AWS, now focused on meeting the evolving needs of hybrid cloud environments as organizations integrate AI into their core operations. Red Hat will showcase the partnership at AWS re:Invent 2025 in booth #839.

Related Links