NVIDIA Warp accelerates differentiable physics code for AI
NVIDIA Warp is emerging as a critical framework for accelerating and differentiating computational physics code within AI-driven workflows. As Computer-Aided Engineering shifts from human-centric processes to AI models requiring high-fidelity physics data, the need for simulators that are GPU-native, fast, and seamlessly integrated into machine learning pipelines has become paramount. Warp bridges the gap between CUDA and Python, allowing developers to write high-performance kernels as standard Python functions that are JIT-compiled for GPU execution. Unlike tensor-based frameworks that operate on entire N-dimensional arrays, Warp enables flexible, element-level control flow, making it ideal for complex simulation logic involving conditionals and selective updates. A key feature of Warp is its native support for automatic differentiation. This capability allows physics solvers to be differentiated end-to-end, enabling optimization and training workflows while maintaining interoperability with popular frameworks like PyTorch and JAX. To demonstrate this potential, NVIDIA presented a 2D Navier-Stokes solver for decaying turbulence built entirely in Warp. The solver utilizes the vorticity-streamfunction formulation, discretizing the transport equation on a grid using a third-order Runge-Kutta scheme and solving the Poisson equation via Fast Fourier Transforms in Fourier space. The implementation highlights two core building blocks: finite-difference discretization with time marching and a tile-based FFT Poisson solver. By mapping computational grid points to GPU threads, the framework achieves massive parallelism. Furthermore, the solver can be made differentiable by pre-allocating arrays for intermediate states and recording kernel launches within a tape. This reverse-mode automatic differentiation computes gradients accurately without the computational cost of finite difference methods, making gradient-based optimization feasible for large-scale simulations. The article illustrates the practical value of this approach through an optimal perturbation problem, where the system identifies initial vorticity perturbations that maximize trajectory divergence. This capability is crucial for flow control and understanding dynamic structures in turbulent flows. Real-world industrial applications validate Warp's performance and scalability. Autodesk Research developed a differentiable Lattice Boltzmann solver using Warp, which achieved approximately eight times the speed of JAX on a single NVIDIA A100 GPU while using significantly less memory. Google DeepMind introduced MuJoCo Warp, a backend for multibody dynamics that delivered speedups of up to 252 times for locomotion and 475 times for manipulation tasks compared to JAX on comparable hardware. Additionally, C-Infinity AutoAssembler utilizes Warp to process CAD assemblies for AI planning, achieving speedups of up to 669 times over optimized CPU baselines for spatial intelligence tasks. By allowing developers to retain the control flow of physics simulations while leveraging GPU acceleration and automatic differentiation, Warp facilitates a new generation of AI-driven engineering. It supports complex, staged workflows that integrate seamlessly with existing deep learning pipelines, offering substantial gains in performance, memory efficiency, and scalability for fields ranging from fluid dynamics to robotics and manufacturing.
