HyperAIHyperAI

Command Palette

Search for a command to run...

CUDA 13.3 Officially Supports C++ Tile Programming, Lowering GPU Development Barriers

Following the initial release of CUDA 13.1 introducing the Tile-based GPU programming model and adding support for Python, NVIDIA has officially enabled this capability for C++ developers in CUDA 13.3. The core concept of CUDA Tiles involves performing calculations on multi-dimensional arrays using "tiles" as fundamental units, thereby shielding underlying details such as SIMT thread scheduling, memory transfers, and asynchronous operations. Developers need only declare how data should be partitioned into tiles and define mathematical operations between them; the compiler automatically handles parallelism, shared memory management, and calls to hardware features like Tensor Cores, significantly reducing the complexity of developing GPU kernels. Compared to traditional CUDA C++ SIMT patterns, Tile programming reduces the amount of manually written code while offering cross-architecture portability—allowing the same codebase to adapt seamlessly across Ampere, Hopper, and newer GPU architectures without requiring hardware-specific rewrites. Developers can compile Tile-enabled kernels via `nvcc` using the `--enable-tile` flag, and Nsight Compute now supports performance analysis for these kernels. The required runtime environment includes GPUs with Compute Capability 8.x or higher, driver R580+, and CUDA Toolkit 13.3. This feature is currently open to all CUDA developers, with documentation and API references available on the official NVIDIA website.

Related Links