Date

16 days ago

Size

657.01 MB

Introduction

Triton is a language and compiler for parallel programming, designed to provide a Python-based programming environment for efficiently writing custom DNN computation kernels that can run at maximum throughput on GPU hardware.

This project is a complete Triton learning tutorial, covering all aspects from basic to advanced, including vector operations, matrix operations, layer normalization, attention mechanisms, and FP8 matrix multiplication.

1. Basic Operation Tutorial

1.1 Vector Addition

01-vector-add.cn.ipynb – An introductory tutorial to vector addition, introducing the basic Triton programming model.

2. Core Operator Tutorial

2.1 Fused Softmax

02-fused-softmax.cn.ipynb – Integrate Softmax operations to learn kernel fusion and reduction operations.

2.2 Matrix Multiplication

03-matrix-multiplication.cn.ipynb High-performance matrix multiplication implementation

2.3 Layer Normalization

05-layer-norm.cn.ipynb – Layer normalization operator implementation

3. Advanced Features Tutorial

3.1 Low-Memory Dropout

04-low-memory-dropout.cn.ipynb – Memory-optimized Dropout implementation

3.2 Fused Attention

06-fused-attention.cn.ipynb – Implementation of Transformer attention mechanism

3.3 Libdevice External Functions

07-extern-functions.cn.ipynb – Using the tl_extra.libdevice external library

3.4 Grouped GEMM

08-grouped-gemm.cn.ipynb – Grouped General Matrix Multiplication Implementation

3.5 Continuous FP8 Matrix Multiplication

09-persistent-matmul.cn.ipynb – Optimization of matrix multiplication with FP8 precision

3.6 Block Scaling Matrix Multiplication

10-block-scaled-matmul.cn.ipynb – Block scaling matrix multiplication implementation

Reference Resources

This notebook is contributed by community users and is intended for educational and informational purposes only. If any content involves copyright infringement, please contact us at [email protected] for prompt review and removal.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

Run this Notebook Discuss on Discord

Date

16 days ago

Size

657.01 MB

Introduction

1. Basic Operation Tutorial

1.1 Vector Addition

01-vector-add.cn.ipynb – An introductory tutorial to vector addition, introducing the basic Triton programming model.

2. Core Operator Tutorial

2.1 Fused Softmax

02-fused-softmax.cn.ipynb – Integrate Softmax operations to learn kernel fusion and reduction operations.

2.2 Matrix Multiplication

03-matrix-multiplication.cn.ipynb High-performance matrix multiplication implementation

2.3 Layer Normalization

05-layer-norm.cn.ipynb – Layer normalization operator implementation

3. Advanced Features Tutorial

3.1 Low-Memory Dropout

04-low-memory-dropout.cn.ipynb – Memory-optimized Dropout implementation

3.2 Fused Attention

06-fused-attention.cn.ipynb – Implementation of Transformer attention mechanism

3.3 Libdevice External Functions

07-extern-functions.cn.ipynb – Using the tl_extra.libdevice external library

3.4 Grouped GEMM

08-grouped-gemm.cn.ipynb – Grouped General Matrix Multiplication Implementation

3.5 Continuous FP8 Matrix Multiplication

09-persistent-matmul.cn.ipynb – Optimization of matrix multiplication with FP8 precision

3.6 Block Scaling Matrix Multiplication

10-block-scaled-matmul.cn.ipynb – Block scaling matrix multiplication implementation

Reference Resources

Related Notebooks

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Triton Compiler Tutorial

Introduction

Table of contents

1. Basic Operation Tutorial

1.1 Vector Addition

2. Core Operator Tutorial

2.1 Fused Softmax

2.2 Matrix Multiplication

2.3 Layer Normalization

3. Advanced Features Tutorial

3.1 Low-Memory Dropout

3.2 Fused Attention

3.3 Libdevice External Functions

3.4 Grouped GEMM

3.5 Continuous FP8 Matrix Multiplication

3.6 Block Scaling Matrix Multiplication

Reference Resources

Build AI with AI

HyperAI Newsletters

Command Palette

Triton Compiler Tutorial

Introduction

Table of contents

1. Basic Operation Tutorial

1.1 Vector Addition

2. Core Operator Tutorial

2.1 Fused Softmax

2.2 Matrix Multiplication

2.3 Layer Normalization

3. Advanced Features Tutorial

3.1 Low-Memory Dropout

3.2 Fused Attention

3.3 Libdevice External Functions

3.4 Grouped GEMM

3.5 Continuous FP8 Matrix Multiplication

3.6 Block Scaling Matrix Multiplication

Reference Resources

Related Notebooks

TVM Tutorial 0.22.0

Open-AutoGLM: Smart Assistant for Mobile Devices

One-click Deployment of SmolLM3-3B-Model

MarkItDown, Microsoft's open-source Document Conversion Tool

Build AI with AI

HyperAI Newsletters

Command Palette

Triton Compiler Tutorial

Introduction

Table of contents

1. Basic Operation Tutorial

1.1 Vector Addition

2. Core Operator Tutorial

2.1 Fused Softmax

2.2 Matrix Multiplication

2.3 Layer Normalization

3. Advanced Features Tutorial

3.1 Low-Memory Dropout

3.2 Fused Attention

3.3 Libdevice External Functions

3.4 Grouped GEMM

3.5 Continuous FP8 Matrix Multiplication

3.6 Block Scaling Matrix Multiplication

Reference Resources

Related Notebooks

TVM Tutorial 0.22.0

Open-AutoGLM: Smart Assistant for Mobile Devices

One-click Deployment of SmolLM3-3B-Model

MarkItDown, Microsoft's open-source Document Conversion Tool

Build AI with AI

HyperAI Newsletters

Related Notebooks

TVM Tutorial 0.22.0

Open-AutoGLM: Smart Assistant for Mobile Devices

One-click Deployment of SmolLM3-3B-Model

MarkItDown, Microsoft's open-source Document Conversion Tool

Related Notebooks

TVM Tutorial 0.22.0

Open-AutoGLM: Smart Assistant for Mobile Devices

One-click Deployment of SmolLM3-3B-Model

MarkItDown, Microsoft's open-source Document Conversion Tool