Hugging Face and NVIDIA Launch Training Cluster as a Service to Boost Global AI Research
Today at GTC Paris, Hugging Face and NVIDIA announced a groundbreaking collaboration called "Training Cluster as a Service" (TCS). This initiative aims to make large, powerful GPU clusters more accessible to research organizations worldwide, enabling them to train sophisticated AI models without the need for extensive infrastructure or financial investment. Making GPU Clusters Accessible Gigawatt-sized GPU superclusters are becoming increasingly common, often associated with major AI projects. However, this trend also widens the gap between organizations with ample GPU resources and those lacking them. Despite the availability of GPUs through hyperscalers, regional and AI-native cloud providers, connecting researchers with these resources remains a challenge. TCS addresses this issue by providing accessible and flexible GPU cluster solutions, allowing organizations to pay only for the time they use the clusters. How It Works Researchers or developers can request a GPU cluster on behalf of their organization via the Hugging Face website at hf.co/training-cluster. Upon acceptance, Hugging Face and NVIDIA collaboratively handle sourcing, pricing, provisioning, and setting up the cluster according to the user's specifications, such as size, region, and duration. This streamlined process ensures that even smaller institutions can access the compute power needed for cutting-edge AI research. Clusters at Work Advancing Rare Genetic Disease Research with TIGEM The Telethon Institute of Genomics and Medicine (TIGEM) focuses on understanding rare genetic diseases and developing innovative treatments. AI models can help predict the effects of pathogenic variants and identify potential drugs through repositioning. According to Diego di Bernardo, the coordinator of TIGEM’s Genomic Medicine program, TCS has made it significantly easier to procuring the necessary GPU capacity, thus accelerating their research efforts. Advancing AI for Mathematics with Numina Numina, a non-profit organization, is dedicated to building open-source AI for mathematical reasoning. They recently won the 2024 AIMO progress prize and aim to create open alternatives to closed-source models like DeepMind’s AlphaProof. Yann Fleureau, cofounder of Project Numina, highlighted that computing resources have been a critical bottleneck. TCS will help Numina overcome this obstacle and achieve their goals. Advancing Material Science with Mirror Physics Mirror Physics, a startup specializing in AI systems for chemistry and materials science, is collaborating with the MACE team to enhance AI capabilities in these fields. Sam Walton Norwood, CEO and founder of Mirror Physics, noted that TCS is facilitating the production of high-fidelity chemical models at an unprecedented scale, marking a significant advancement for the industry. Powering the Diversity of AI Research Clément Delangue, cofounder and CEO of Hugging Face, emphasized that access to large-scale, high-performance compute is crucial for building the next generation of AI models. TCS will break down barriers for researchers and companies, enabling them to train advanced models and push the boundaries of AI. The integration of DGX Cloud Lepton, NVIDIA’s cloud service, further enhances the accessibility and familiarity of the tools used by Hugging Face. Enabling AI Builders with NVIDIA Alexis Bjorlin, vice president of DGX Cloud at NVIDIA, highlighted the seamless integration of high-performance NVIDIA GPUs across a broad network of cloud providers. This collaboration simplifies the scaling of AI training workloads, making it easier for researchers and organizations to use familiar tools on Hugging Face. Evaluation by Industry Insiders and Company Profiles Industry experts and insiders view the collaboration as a game-changer for democratizing access to AI compute resources. By reducing the financial and technical hurdles, TCS allows more diverse and innovative AI projects to flourish. Hugging Face, known for its open-source approach and robust community, and NVIDIA, a leader in GPU technology and AI infrastructure, together bring a powerful combination to the table. Their joint effort not only supports smaller organizations but also helps maintain the pace of innovation in AI, ensuring that advancements continue to benefit a wide array of sectors.
