HyperAIHyperAI
Back to Headlines

NVIDIA RAPIDS 25.08 Unveils Enhanced GPU Acceleration with New Profiler, Streaming Engine, and Expanded Algorithm Support

4 days ago

NVIDIA RAPIDS 25.08 introduces a range of powerful new features designed to enhance performance, scalability, and usability in GPU-accelerated data science. The release brings advanced profiling tools for cuML, major improvements to the Polars GPU engine, and expanded algorithm support, all aimed at making data workflows faster and more efficient. A key addition is the new profiling suite for cuML’s zero code change accelerator. Two new profilers help users identify which operations run on the GPU versus the CPU and how long each takes. The function-level profiler analyzes entire functions or code cells and reports execution time on both devices. In Jupyter notebooks, users can activate it with the %%cuml.accel.profile magic command. It can also be invoked via the --profile flag in the command line. The line-level profiler provides even finer-grained insight, showing execution details down to the individual line of code. This is accessible through %%cuml.accel.line_profile in notebooks or the --line-profile flag in scripts. These tools help users optimize their machine learning pipelines by pinpointing performance bottlenecks. The Polars GPU engine has been significantly enhanced with the new default streaming executor. This mode enables processing of datasets larger than GPU memory by breaking data into partitions and streaming them efficiently across the GPU. While it incurs minimal overhead on smaller datasets, it delivers dramatic speedups—up to nearly 5x—on large workloads exceeding VRAM capacity. The streaming executor now supports almost all operations available in the in-memory engine, making it a robust and scalable solution. Further improvements include full GPU support for struct data types in Polars columns, eliminating previous CPU fallbacks. This enables efficient handling of complex nested data. The engine also now supports a much broader range of string operations, enhancing its ability to process real-world data with minimal performance loss. In cuML, the 25.08 release adds several new algorithms. Spectral Embedding is now available for dimensionality reduction and manifold learning, with an API compatible with scikit-learn. Additionally, LinearSVC, LinearSVR, and KernelRidge are now supported under cuML’s zero code change accelerator, bringing the full suite of support vector machines and kernel-based regression into the accelerated workflow. As part of the release, support for CUDA 11 has been deprecated. Users requiring CUDA 11 should pin to RAPIDS 25.06. All new builds and packages now require CUDA 12 or later. Overall, RAPIDS 25.08 strengthens the platform’s position as a leading tool for accelerated data science. With better diagnostics, scalable execution, and expanded functionality, it empowers data scientists and engineers to build faster, more efficient workflows. For more details, visit the RAPIDS documentation. Developers are encouraged to share feedback on GitHub or join the growing RAPIDS Slack community. New users can get started with free courses and hands-on training on accelerated data science techniques.

Related Links