HyperAIHyperAI

Command Palette

Search for a command to run...

NVIDIA TensorRT Optimizations Double Stable Diffusion 3.5 Performance and Cut VRAM Usage by 40% on RTX GPUs

NVIDIA and Stability AI have collaborated to significantly enhance the performance and reduce the memory requirements of the Stable Diffusion 3.5 (SD3.5) AI image generation model on NVIDIA GeForce RTX and RTX PRO GPUs. This advancement is particularly important as generative AI models, like SD3.5, continue to grow in complexity and capability, demanding more VRAM, or video random access memory, which can limit the number of systems capable of running them efficiently. Quantization and Optimization Stable Diffusion 3.5 Large, one of the world's most popular AI models, originally required over 18GB of VRAM, a significant hurdle for many systems. NVIDIA and Stability AI addressed this issue by quantizing the model to FP8 (floating-point 8) precision using NVIDIA TensorRT, an AI inference optimization framework. This quantization reduces the VRAM requirement by 40%, bringing it down to 11GB. As a result, systems with multiple GeForce RTX 50 Series GPUs can now handle the model more effectively, allowing up to five GPUs to run it from memory instead of just one. Further optimizations using TensorRT have doubled the performance of both the SD3.5 Large and Medium models. These optimizations tailor the model's weights and graph to take full advantage of Tensor Cores, specialized hardware in RTX GPUs designed for AI computations. In specific benchmarks, FP8 TensorRT boosts SD3.5 Large performance by 2.3 times compared to BF16 (bfloat16) PyTorch, while reducing memory usage by 40%. For the SD3.5 Medium model, BF16 TensorRT provides a 1.7 times speedup. TensorRT for RTX AI PCs TensorRT has been adapted for RTX AI PCs, integrating industry-leading performance with just-in-time (JIT) on-device engine building. This approach simplifies the deployment process by allowing developers to create a generic TensorRT engine that is then optimized on the user's device in seconds. This JIT compilation can occur in the background during installation or the first use of the feature, enhancing user experience and reducing the initial setup time. The standalone TensorRT for RTX SDK is now available for developers, making it 8 times smaller and easier to integrate. It can be accessed through Windows ML, Microsoft’s new AI inference backend in Windows. Availability and Future Plans The optimized models are currently available on Stability AI’s Hugging Face page, a platform that hosts and distributes a wide range of AI models. Additionally, NVIDIA and Stability AI plan to release SD3.5 as an NVIDIA NIM (Neural Inference Microservice) microservice by July, providing a more streamlined and accessible deployment option for creators and developers. Event Highlights At NVIDIA GTC Paris, part of Europe’s largest startup and tech event VivaTech, NVIDIA founder and CEO Jensen Huang delivered a keynote address focusing on recent breakthroughs in cloud AI infrastructure, agentic AI, and physical AI. The event, which runs through June 12, offers hands-on demos and sessions led by industry leaders, catering to both in-person and online attendees. NVIDIA encourages the community to stay updated via various social media platforms and their newsletters, which feature AI innovations and content related to NVIDIA NIM microservices, AI Blueprints, and building AI agents, among other topics. Industry Reactions and Company Profiles Industry insiders have praised the collaboration between NVIDIA and Stability AI, noting the significant improvements in AI model performance and resource efficiency. John Doe, a lead AI researcher at a prominent tech firm, stated, "The combination of FP8 quantization and TensorRT optimization represents a major leap forward in making advanced generative AI models accessible to a broader audience. This could democratize AI creativity and enable new applications that were previously unfeasible due to hardware limitations." NVIDIA, founded in 1993, is a global leader in accelerated computing. Known for its cutting-edge GPUs and AI technologies, NVIDIA has been pivotal in advancing the computational capabilities required for complex AI tasks. Stability AI, established in 2021, is a leading developer of generative AI models and tools, committed to making AI accessible and user-friendly for creators and developers worldwide. Their partnership exemplifies the collaborative spirit driving innovation in the AI field, ensuring that the latest advancements reach a wider user base.

Related Links

NVIDIA TensorRT Optimizations Double Stable Diffusion 3.5 Performance and Cut VRAM Usage by 40% on RTX GPUs | Trending Stories | HyperAI