Event Preview | AMD/Muxi Integrated Circuit/ByteDance/Peking University/Shanghai Innovation and Technology Gathered in Beijing to Explore Multiple Perspectives From bottom-level Compilation to Scenario Applications

In an era when AI is transforming thousands of industries, a technological revolution in efficiency, deployability, and computing sustainability is quietly taking place. As a key middleware that connects the upper and lower levels, the AI compiler connects the underlying hardware and upper-level applications. Whether it is TVM, which has been widely used in the industry, or Triton, which has risen rapidly in recent years, or TileLang, an operator programming language that only emerged at the beginning of this year, compilation technology is not only a basic guarantee for the model to "run", but is also being upgraded to a key technology to support "efficient execution and resource utilization optimization."

Innovations and practices around AI compilers continue to emerge, and people's attention to this field is also increasing! In order to better connect cutting-edge research and application scenarios,On July 5, HyperAI will hold the 7th Meet AI Compiler Technology Salon in Beijing.We are honored to invite four senior experts from AMD, Muxi Integrated Circuit, ByteDance, and Peking University to share their best practices and trend analysis for AI compilers. In addition, Feng Siyuan, assistant professor of Shanghai Innovation Institute and Apache TVM PMC, will serve as the moderator of the roundtable session and have an in-depth discussion with many lecturers on the theme of "a unified compilation ecosystem across hardware."

We have also prepared exquisite gifts and tea breaks for everyone, come and join us~

Event Details

⏰ Time: July 5 (Saturday) 13:30-17:45

📍 Location: Garage Coffee, No. 48, Haidian West Street, Haidian District, Beijing

👬 Number of people: 200 (limited seats on site, please register as early as possible)

🙌🏻 Registration: Enter the link below to register

https://www.huodongxing.com/event/1810501012111

📝 Agenda:

Guests and Agenda

Session 1

Sharing guests

Share topic:Helping the open source community, analyzing AMD Triton compiler

Contents:Triton is a programming language proposed by OpenAI that is designed to simplify the development of high-performance GPU Kernel. It has been widely used in the mainstream LLM reasoning training framework. Users can implement GPU Kernel by developing Python Triton code without having to worry about the underlying GPU architecture details, which greatly reduces the difficulty of GPU code development.

AMD has implemented the Triton compiler on relevant GPU platforms and contributed it to the Triton open source community. In order to optimize GPU code performance, you need to understand the Triton compiler and its role in kernel performance optimization.This sharing will discuss the AMD Triton compiler in detail and introduce how the compiler improves the performance of Triton on AMD GPU platforms.

Watch this sharing session and you will learn:

1. Introduction to AMD GPU architecture

2. AMD GPU’s latest work on the Triton open source community

Share topic:TVM application practice on Muxi GPU

Contents:This discussion mainly focuses on how to apply TVM on Muxi GPU.For Muxi GPU, high-performance operators are generated around TVM to enable mainstream AI frameworks based on TVM.

Watch this sharing session and you will learn:

1. Problems that may be encountered when adapting TVM to domestic GPGPU

2. What are the benefits of TVM on domestic GPGPU and what aspects need further breakthroughs?

3. About the support status of AI compilers such as TVM on domestic GPGPU, and discuss how to expand the related ecosystem

Share topic:Triton-distributed: native Python programming for high-performance communication

Contents:The scale of single chips is gradually reaching a bottleneck. Single accelerators cannot support large language model training and reasoning. Distributed systems have become a rigid demand. Computing, memory access, and communication are concurrent in distributed systems, but existing frameworks are mostly optimized independently, making it difficult to collaboratively release cluster performance.

This report proposes Triton-distributed (Triton compiler extension), which is the first to advocate native overlapping optimization of distributed AI workloads and covers multi-framework optimization.By integrating OpenSHMEM communication primitives, using the compiler to achieve joint optimization of three activities, demonstrating the application of overlapping technology and single/multi-node programming methods, the generated code fully utilizes heterogeneous resources in a cluster environment, outperforming hand-optimized code, and the development cost is significantly lower than CUDA/C++.

Watch this sharing session and you will learn:

1. Triton-distributed latest technology

2. Challenges of Programming Communications from Python

3. Future Direction of Distributed Compilation

Share topic:TileLang: Operator development is no longer "brain-burning", and performance is still online

Contents:This time we bring a new operator programming language - TileLang.Through explicit tile-level primitives and automatic reasoning mechanisms, it enables developers to efficiently implement hardware-aware neural operators, balancing control and development efficiency. Compared with traditional compilers (such as Triton), TileLang can achieve up to 6 times performance improvement on mainstream GPUs, significantly simplifying the development process and making performance optimization no longer "exclusive to experts."

Watch this sharing session and you will learn:

1. Master a simpler and more efficient high-performance operator development language

2. Understand TileLang's core design concept and technical advantages

Session 2

Roundtable Discussion

Roundtable topics:Unified compilation ecosystem across hardware

Organizers and partners

HyperAI (hyper.ai) is an internationally leading artificial intelligence and high-performance computing community.It aims to help developers and enthusiasts in the global data science and artificial intelligence industry learn, understand and practice by providing a series of services such as industry information reports, accelerated data set downloads, online tutorial demonstrations, popular model performance evaluations, cutting-edge paper recommendations, high-value results interpretations, and top conference calendar integration, and build the future of artificial intelligence together with the community.

Visit the official website:https://hyper.ai/

OpenBayes Bayesian Computing is a leading high-performance computing service provider in ChinaBy grafting classic software ecosystems and machine learning models onto new-generation heterogeneous chips, it provides industrial enterprises and university scientific research with faster and easier-to-use data science computing products. Its products have been adopted by dozens of large industrial scenarios or leading scientific research institutes.

Visit the official website:https://openbayes.com/

The MLC.AI community was established in June 2022. Chen Tianqi, the main inventor of Apache TVM and a well-known young scholar in the field of machine learning, led the team to launch the MLC online course, which systematically introduced the key elements and core concepts of machine learning compilation.

In November 2022, with the joint efforts of MLC.AI community volunteers, the first complete TVM Chinese documentation was launched and successfully hosted on the HyperAI official website, further providing domestic developers interested in machine learning compilation with the basic settings for accessing and learning a new technology - documentation.

MLC Online Courses:https://mlc.ai/

TVM Chinese Documentation:https://tvm.hyper.ai/

Founded in April 2011, Garage Coffee is one of the earliest companies in China to focus on early-stage Internet startups. It has built a low-cost, convenient, full-factor, open innovation and entrepreneurship service platform for early-stage entrepreneurs around the concept of "mass entrepreneurship."

As the first makerspace in Beijing's Zhongguancun Entrepreneurship Street, Garage Coffee uses coffee shops as interactive carriers to provide entrepreneurial teams with interactive office space and incubation services for sharing, co-promotion, integration and co-existence. Garage Coffee is the world's first entrepreneurial-themed coffee shop, and is China's most influential national makerspace and international innovation and entrepreneurship platform.

Event Support

Active row:Scan the QR code to jump to the event registration

Scan the QR code and remark "AI Compiler" to join the event group

Taking into account the venue space conditions of this event, we have only opened 200 places for attendance. We recommend that you register as early as possible to secure a seat.

See you on July 5th from 13:30 to 17:45!

HyperAI

Event Preview | AMD/Muxi Integrated Circuit/ByteDance/Peking University/Shanghai Innovation and Technology Gathered in Beijing to Explore Multiple Perspectives From bottom-level Compilation to Scenario Applications

a year ago

Information

AI Compiler

Artificial Intelligence

Reinforcement Learning

Machine Learning

Deep Learning

We have also prepared exquisite gifts and tea breaks for everyone, come and join us~

Event Details

⏰ Time: July 5 (Saturday) 13:30-17:45

📍 Location: Garage Coffee, No. 48, Haidian West Street, Haidian District, Beijing

👬 Number of people: 200 (limited seats on site, please register as early as possible)

🙌🏻 Registration: Enter the link below to register

https://www.huodongxing.com/event/1810501012111

📝 Agenda:

Guests and Agenda

Session 1

Sharing guests

Share topic:Helping the open source community, analyzing AMD Triton compiler

Watch this sharing session and you will learn:

1. Introduction to AMD GPU architecture

2. AMD GPU’s latest work on the Triton open source community

Share topic:TVM application practice on Muxi GPU

Contents:This discussion mainly focuses on how to apply TVM on Muxi GPU.For Muxi GPU, high-performance operators are generated around TVM to enable mainstream AI frameworks based on TVM.

Watch this sharing session and you will learn:

1. Problems that may be encountered when adapting TVM to domestic GPGPU

2. What are the benefits of TVM on domestic GPGPU and what aspects need further breakthroughs?

3. About the support status of AI compilers such as TVM on domestic GPGPU, and discuss how to expand the related ecosystem

Share topic:Triton-distributed: native Python programming for high-performance communication

Watch this sharing session and you will learn:

1. Triton-distributed latest technology

2. Challenges of Programming Communications from Python

3. Future Direction of Distributed Compilation

Share topic:TileLang: Operator development is no longer "brain-burning", and performance is still online

Watch this sharing session and you will learn:

1. Master a simpler and more efficient high-performance operator development language

2. Understand TileLang's core design concept and technical advantages

Session 2

Roundtable Discussion

Roundtable topics:Unified compilation ecosystem across hardware

Organizers and partners

Visit the official website:https://hyper.ai/

Visit the official website:https://openbayes.com/

MLC Online Courses:https://mlc.ai/

TVM Chinese Documentation:https://tvm.hyper.ai/

Event Support

Active row:Scan the QR code to jump to the event registration

Scan the QR code and remark "AI Compiler" to join the event group

Taking into account the venue space conditions of this event, we have only opened 200 places for attendance. We recommend that you register as early as possible to secure a seat.

See you on July 5th from 13:30 to 17:45!

Command Palette

Event Preview | AMD/Muxi Integrated Circuit/ByteDance/Peking University/Shanghai Innovation and Technology Gathered in Beijing to Explore Multiple Perspectives From bottom-level Compilation to Scenario Applications

Command Palette

Event Preview | AMD/Muxi Integrated Circuit/ByteDance/Peking University/Shanghai Innovation and Technology Gathered in Beijing to Explore Multiple Perspectives From bottom-level Compilation to Scenario Applications

Related News

Event Preview | AI Computing, TileRT, Tencent, Huawei, and AI Computing Innovation Join Forces to Explore Multi-Level Collaborative Optimization

EnergAIzer, a GPU Power Estimation Framework Developed by MIT and Others, Completes Predictions in an Average of 1.8 Seconds With an Error of Approximately 81 TP3T.

4-step Image output/4K quality/6x Speedup, PiD Uses Pixel Diffusion to Unify Decoding and super-resolution Output; SA-3DAO: a Dataset Containing 1000 Pairs of Real Images Paired With Handcrafted 3D Meshes by artists.

Tencent open-sources Hy-MT1.5 Translation Model: 440MB Achieves top-tier Translation Capabilities; MIT Jointly Releases MathNet: a Multimodal Mathematical Inference Benchmark Covering 27,000 Real Olympiad Math problems.

Zero-sampling TTS Breakthrough! A Few Seconds of Reference Audio, OmniVoice Helps You Easily Clone Hundreds of Languages; 17 Languages All in One Go: MDPbench Solves the Major Problem of Parsing low-resource Text systems.

A Locally Runnable Privacy Detection Model: Privacy Filter Achieves high-quality PII Filtering at Low Cost; Hardcore Open Source! Covering the Transfermarkt Structured Football Dataset With Over 80,000 matches.

Can Emojis Control Speech Generation? Irodori-TTS Is a Japanese TTS Based on the RF-DiT Architecture; Eczema and Tinea Skin Disease Datasets: Supporting Medical Image Classification and Transfer learning.

Anima V1, a brand-new Raw Image Model, Has Been Released, Focusing on anime-style Image Generation; the MemLens Multimodal long-range Memory Evaluation Dataset Covers cross-conversation text-to-image Reasoning and Knowledge Update mechanisms.

Online Tutorial | UC Berkeley/NVIDIA and Others Release Gsplat, an open-source 3DGS Library That Saves 4x GPU Memory and Reduces Training Time by 10%.

Command Palette

Event Preview | AMD/Muxi Integrated Circuit/ByteDance/Peking University/Shanghai Innovation and Technology Gathered in Beijing to Explore Multiple Perspectives From bottom-level Compilation to Scenario Applications

Related News

Event Preview | AI Computing, TileRT, Tencent, Huawei, and AI Computing Innovation Join Forces to Explore Multi-Level Collaborative Optimization

EnergAIzer, a GPU Power Estimation Framework Developed by MIT and Others, Completes Predictions in an Average of 1.8 Seconds With an Error of Approximately 81 TP3T.

4-step Image output/4K quality/6x Speedup, PiD Uses Pixel Diffusion to Unify Decoding and super-resolution Output; SA-3DAO: a Dataset Containing 1000 Pairs of Real Images Paired With Handcrafted 3D Meshes by artists.

Tencent open-sources Hy-MT1.5 Translation Model: 440MB Achieves top-tier Translation Capabilities; MIT Jointly Releases MathNet: a Multimodal Mathematical Inference Benchmark Covering 27,000 Real Olympiad Math problems.

Zero-sampling TTS Breakthrough! A Few Seconds of Reference Audio, OmniVoice Helps You Easily Clone Hundreds of Languages; 17 Languages All in One Go: MDPbench Solves the Major Problem of Parsing low-resource Text systems.

A Locally Runnable Privacy Detection Model: Privacy Filter Achieves high-quality PII Filtering at Low Cost; Hardcore Open Source! Covering the Transfermarkt Structured Football Dataset With Over 80,000 matches.

Can Emojis Control Speech Generation? Irodori-TTS Is a Japanese TTS Based on the RF-DiT Architecture; Eczema and Tinea Skin Disease Datasets: Supporting Medical Image Classification and Transfer learning.

Anima V1, a brand-new Raw Image Model, Has Been Released, Focusing on anime-style Image Generation; the MemLens Multimodal long-range Memory Evaluation Dataset Covers cross-conversation text-to-image Reasoning and Knowledge Update mechanisms.

Online Tutorial | UC Berkeley/NVIDIA and Others Release Gsplat, an open-source 3DGS Library That Saves 4x GPU Memory and Reduces Training Time by 10%.

Related News

Event Preview | AI Computing, TileRT, Tencent, Huawei, and AI Computing Innovation Join Forces to Explore Multi-Level Collaborative Optimization

EnergAIzer, a GPU Power Estimation Framework Developed by MIT and Others, Completes Predictions in an Average of 1.8 Seconds With an Error of Approximately 81 TP3T.

4-step Image output/4K quality/6x Speedup, PiD Uses Pixel Diffusion to Unify Decoding and super-resolution Output; SA-3DAO: a Dataset Containing 1000 Pairs of Real Images Paired With Handcrafted 3D Meshes by artists.

Tencent open-sources Hy-MT1.5 Translation Model: 440MB Achieves top-tier Translation Capabilities; MIT Jointly Releases MathNet: a Multimodal Mathematical Inference Benchmark Covering 27,000 Real Olympiad Math problems.

Zero-sampling TTS Breakthrough! A Few Seconds of Reference Audio, OmniVoice Helps You Easily Clone Hundreds of Languages; 17 Languages All in One Go: MDPbench Solves the Major Problem of Parsing low-resource Text systems.

A Locally Runnable Privacy Detection Model: Privacy Filter Achieves high-quality PII Filtering at Low Cost; Hardcore Open Source! Covering the Transfermarkt Structured Football Dataset With Over 80,000 matches.

Can Emojis Control Speech Generation? Irodori-TTS Is a Japanese TTS Based on the RF-DiT Architecture; Eczema and Tinea Skin Disease Datasets: Supporting Medical Image Classification and Transfer learning.

Anima V1, a brand-new Raw Image Model, Has Been Released, Focusing on anime-style Image Generation; the MemLens Multimodal long-range Memory Evaluation Dataset Covers cross-conversation text-to-image Reasoning and Knowledge Update mechanisms.

Online Tutorial | UC Berkeley/NVIDIA and Others Release Gsplat, an open-source 3DGS Library That Saves 4x GPU Memory and Reduces Training Time by 10%.

Related News

Event Preview | AI Computing, TileRT, Tencent, Huawei, and AI Computing Innovation Join Forces to Explore Multi-Level Collaborative Optimization

EnergAIzer, a GPU Power Estimation Framework Developed by MIT and Others, Completes Predictions in an Average of 1.8 Seconds With an Error of Approximately 81 TP3T.

4-step Image output/4K quality/6x Speedup, PiD Uses Pixel Diffusion to Unify Decoding and super-resolution Output; SA-3DAO: a Dataset Containing 1000 Pairs of Real Images Paired With Handcrafted 3D Meshes by artists.

Tencent open-sources Hy-MT1.5 Translation Model: 440MB Achieves top-tier Translation Capabilities; MIT Jointly Releases MathNet: a Multimodal Mathematical Inference Benchmark Covering 27,000 Real Olympiad Math problems.

Zero-sampling TTS Breakthrough! A Few Seconds of Reference Audio, OmniVoice Helps You Easily Clone Hundreds of Languages; 17 Languages All in One Go: MDPbench Solves the Major Problem of Parsing low-resource Text systems.

A Locally Runnable Privacy Detection Model: Privacy Filter Achieves high-quality PII Filtering at Low Cost; Hardcore Open Source! Covering the Transfermarkt Structured Football Dataset With Over 80,000 matches.

Can Emojis Control Speech Generation? Irodori-TTS Is a Japanese TTS Based on the RF-DiT Architecture; Eczema and Tinea Skin Disease Datasets: Supporting Medical Image Classification and Transfer learning.

Anima V1, a brand-new Raw Image Model, Has Been Released, Focusing on anime-style Image Generation; the MemLens Multimodal long-range Memory Evaluation Dataset Covers cross-conversation text-to-image Reasoning and Knowledge Update mechanisms.

Online Tutorial | UC Berkeley/NVIDIA and Others Release Gsplat, an open-source 3DGS Library That Saves 4x GPU Memory and Reduces Training Time by 10%.