Early Bird Ticket Countdown | TVM/Triton/TileLang Show off Their Skills on the Same Stage, Meet AI Compiler Invites You to Unlock the Infinite Possibilities of AI Compilers!

a year ago

HyperAI will hold the 7th Meet AI Compiler Technology Salon in Zhongguancun, Beijing on July 5.This event invited four senior experts from AMD, Muxi Integrated Circuit, ByteDance, and Peking University to explore the cutting-edge practices of AI compilers from multiple perspectives, from low-level compilation to upper-level applications. In addition, Feng Siyuan, assistant professor of Shanghai Chuangzhi College and Apache TVM PMC, will serve as the moderator of the roundtable session and have an in-depth discussion with the lecturers on the theme of "a unified compilation ecosystem across hardware."

🎫Early bird tickets will be sold out at 23:30 today, so hurry up and get on board! See you there~

We have also prepared exquisite gifts and tea breaks for everyone on the day of the event. Please sign up for the event and follow the "HyperAI Super Neuro" official account. Come and participate~

Event Details

⏰ Time: July 5 (Saturday) 13:30-17:45

📍 Location: Garage Coffee, No. 48, Haidian West Street, Haidian District, Beijing

👬 Number of people: 200 (limited seats on site, please register as early as possible)

🙌🏻 Registration: Enter the link to register~

https://www.huodongxing.com/event/1810501012111

Scan the QR code and remark "AI Compiler" to join the event group:

📝 Agenda:

Guests and Agenda

Session 1 Sharing guests

Share topic:Helping the open source community, analyzing AMD Triton compiler

Contents:Triton is a programming language proposed by OpenAI that is designed to simplify the development of high-performance GPU Kernel. It has been widely used in the mainstream LLM reasoning training framework. Users can implement GPU Kernel by developing Python Triton code without having to worry about the underlying GPU architecture details, which greatly reduces the difficulty of GPU code development.

AMD has implemented the Triton compiler on relevant GPU platforms and contributed it to the Triton open source community. In order to optimize GPU code performance, you need to understand the Triton compiler and its role in kernel performance optimization.This sharing will discuss the AMD Triton compiler in detail and introduce how the compiler improves the performance of Triton on AMD GPU platforms.

Watch this sharing session and you will learn:

1. Introduction to AMD GPU architecture

2. AMD GPU’s latest work on the Triton open source community

Share topic:TVM application practice on Muxi GPU

Contents:This discussion mainly focuses on how to apply TVM on Muxi GPU.For Muxi GPU, high-performance operators are generated around TVM to enable mainstream AI frameworks based on TVM.

Watch this sharing session and you will learn:

1. Problems that may be encountered when adapting TVM to domestic GPGPU

2. What are the benefits of TVM on domestic GPGPU and what aspects need further breakthroughs?

3. About the support status of AI compilers such as TVM on domestic GPGPU, and discuss how to expand the related ecosystem

Share topic:Triton-distributed: native Python programming for high-performance communication

Contents:The scale of single chips is gradually reaching a bottleneck. Single accelerators cannot support large language model training and reasoning. Distributed systems have become a rigid demand. Computing, memory access, and communication are concurrent in distributed systems, but existing frameworks are mostly optimized independently, making it difficult to collaboratively release cluster performance.

This report proposes Triton-distributed (Triton compiler extension), which is the first to advocate native overlapping optimization of distributed AI workloads and covers multi-framework optimization.By integrating OpenSHMEM communication primitives, using the compiler to achieve joint optimization of three activities, demonstrating the application of overlapping technology and single/multi-node programming methods, the generated code fully utilizes heterogeneous resources in a cluster environment, outperforming hand-optimized code, and the development cost is significantly lower than CUDA/C++.

Watch this sharing session and you will learn:

1. Triton-distributed latest technology

2. Challenges of Programming Communications from Python

3. Future Direction of Distributed Compilation

Share topic:TileLang: Operator development is no longer "brain-burning", and performance is still online

Contents:This time we bring a new operator programming language - TileLang.Through explicit tile-level primitives and automatic reasoning mechanisms, it enables developers to efficiently implement hardware-aware neural operators, balancing control and development efficiency. Compared with traditional compilers (such as Triton), TileLang can achieve up to 6 times performance improvement on mainstream GPUs, significantly simplifying the development process and making performance optimization no longer "exclusive to experts."

Watch this sharing session and you will learn:

1. Master a simpler and more efficient high-performance operator development language

2. Understand TileLang's core design concept and technical advantages

Session 2 Roundtable Discussion

Roundtable topics:Unified compilation ecosystem across hardware

Organizers and partners

As a premier global community in artificial intelligence and high-performance computing, HyperAI (hyper.ai) is committed to supporting developers and enthusiasts across the global data science and AI industry by providing a comprehensive suite of services— including industry news reports, accelerated data set downloads, online tutorials, benchmarks of leading AI models, curated recommendations of cutting-edge research papers, in-depth interpretations of high-impact results and integration with top conference calendars. HyperAI empowers developers to explore, comprehend, and apply AI, driving innovation and shaping the future of artificial intelligence in collaboration with community.

Visit the official website:https://hyper.ai/

OpenBayes Bayesian Computing is a leading high-performance computing service provider in ChinaBy grafting classic software ecosystems and machine learning models onto new-generation heterogeneous chips, it provides industrial enterprises and university scientific research with faster and easier-to-use data science computing products. Its products have been adopted by dozens of large industrial scenarios or leading scientific research institutes.

Visit the official website:https://openbayes.com/

The MLC.AI community was established in June 2022. Chen Tianqi, the main inventor of Apache TVM and a well-known young scholar in the field of machine learning, led the team to launch the MLC online course, which systematically introduced the key elements and core concepts of machine learning compilation.

In November 2022, with the joint efforts of MLC.AI community volunteers, the first complete TVM Chinese documentation was launched and successfully hosted on the HyperAI official website, further providing domestic developers interested in machine learning compilation with the basic settings for accessing and learning a new technology - documentation.

MLC Online Courses:https://mlc.ai/

TVM Chinese Documentation:https://tvm.hyper.ai/

Founded in April 2011, Garage Coffee is the first company in China to focus on early Internet startups and build a low-cost, convenient, full-factor, open innovation and entrepreneurship service platform for early entrepreneurs around "mass entrepreneurship".

tower.

As the first makerspace in Beijing's Zhongguancun Entrepreneurship Street, Garage Coffee uses coffee shops as interactive carriers to provide entrepreneurial teams with interactive office space and incubation services for sharing, co-promotion, integration and co-existence. Garage Coffee is the world's first entrepreneurial-themed coffee shop, and is China's most influential national makerspace and international innovation and entrepreneurship platform.

Event Support

Active row:Scan the QR code to jump to the event registration

Scan the QR code and remark "AI Compiler" to join the event group

Taking into account the venue space conditions of this event, we have only opened 200 places for attendance. We recommend that you register as early as possible to secure a seat.

See you on July 5th from 13:30 to 17:45!

Early Bird Ticket Countdown | TVM/Triton/TileLang Show off Their Skills on the Same Stage, Meet AI Compiler Invites You to Unlock the Infinite Possibilities of AI Compilers!

a year ago

Information

Artificial Intelligence

Machine Learning

Deep Learning

🎫Early bird tickets will be sold out at 23:30 today, so hurry up and get on board! See you there~

We have also prepared exquisite gifts and tea breaks for everyone on the day of the event. Please sign up for the event and follow the "HyperAI Super Neuro" official account. Come and participate~

Event Details

⏰ Time: July 5 (Saturday) 13:30-17:45

📍 Location: Garage Coffee, No. 48, Haidian West Street, Haidian District, Beijing

👬 Number of people: 200 (limited seats on site, please register as early as possible)

🙌🏻 Registration: Enter the link to register~

https://www.huodongxing.com/event/1810501012111

Scan the QR code and remark "AI Compiler" to join the event group:

📝 Agenda:

Guests and Agenda

Session 1 Sharing guests

Share topic:Helping the open source community, analyzing AMD Triton compiler

Watch this sharing session and you will learn:

1. Introduction to AMD GPU architecture

2. AMD GPU’s latest work on the Triton open source community

Share topic:TVM application practice on Muxi GPU

Contents:This discussion mainly focuses on how to apply TVM on Muxi GPU.For Muxi GPU, high-performance operators are generated around TVM to enable mainstream AI frameworks based on TVM.

Watch this sharing session and you will learn:

1. Problems that may be encountered when adapting TVM to domestic GPGPU

2. What are the benefits of TVM on domestic GPGPU and what aspects need further breakthroughs?

3. About the support status of AI compilers such as TVM on domestic GPGPU, and discuss how to expand the related ecosystem

Share topic:Triton-distributed: native Python programming for high-performance communication

Watch this sharing session and you will learn:

1. Triton-distributed latest technology

2. Challenges of Programming Communications from Python

3. Future Direction of Distributed Compilation

Share topic:TileLang: Operator development is no longer "brain-burning", and performance is still online

Watch this sharing session and you will learn:

1. Master a simpler and more efficient high-performance operator development language

2. Understand TileLang's core design concept and technical advantages

Session 2 Roundtable Discussion

Roundtable topics:Unified compilation ecosystem across hardware

Organizers and partners

Visit the official website:https://hyper.ai/

Visit the official website:https://openbayes.com/

MLC Online Courses:https://mlc.ai/

TVM Chinese Documentation:https://tvm.hyper.ai/

tower.

Event Support

Active row:Scan the QR code to jump to the event registration

Scan the QR code and remark "AI Compiler" to join the event group

Taking into account the venue space conditions of this event, we have only opened 200 places for attendance. We recommend that you register as early as possible to secure a seat.

See you on July 5th from 13:30 to 17:45!

Command Palette

Early Bird Ticket Countdown | TVM/Triton/TileLang Show off Their Skills on the Same Stage, Meet AI Compiler Invites You to Unlock the Infinite Possibilities of AI Compilers!

Event Details

Guests and Agenda

Organizers and partners

Event Support

Command Palette

Early Bird Ticket Countdown | TVM/Triton/TileLang Show off Their Skills on the Same Stage, Meet AI Compiler Invites You to Unlock the Infinite Possibilities of AI Compilers!

Event Details

Guests and Agenda

Organizers and partners

Event Support

Related News

Event Preview | AI Computing, TileRT, Tencent, Huawei, and AI Computing Innovation Join Forces to Explore Multi-Level Collaborative Optimization

EnergAIzer, a GPU Power Estimation Framework Developed by MIT and Others, Completes Predictions in an Average of 1.8 Seconds With an Error of Approximately 81 TP3T.

4-step Image output/4K quality/6x Speedup, PiD Uses Pixel Diffusion to Unify Decoding and super-resolution Output; SA-3DAO: a Dataset Containing 1000 Pairs of Real Images Paired With Handcrafted 3D Meshes by artists.

Tencent open-sources Hy-MT1.5 Translation Model: 440MB Achieves top-tier Translation Capabilities; MIT Jointly Releases MathNet: a Multimodal Mathematical Inference Benchmark Covering 27,000 Real Olympiad Math problems.

Zero-sampling TTS Breakthrough! A Few Seconds of Reference Audio, OmniVoice Helps You Easily Clone Hundreds of Languages; 17 Languages All in One Go: MDPbench Solves the Major Problem of Parsing low-resource Text systems.

A Locally Runnable Privacy Detection Model: Privacy Filter Achieves high-quality PII Filtering at Low Cost; Hardcore Open Source! Covering the Transfermarkt Structured Football Dataset With Over 80,000 matches.

Can Emojis Control Speech Generation? Irodori-TTS Is a Japanese TTS Based on the RF-DiT Architecture; Eczema and Tinea Skin Disease Datasets: Supporting Medical Image Classification and Transfer learning.

Anima V1, a brand-new Raw Image Model, Has Been Released, Focusing on anime-style Image Generation; the MemLens Multimodal long-range Memory Evaluation Dataset Covers cross-conversation text-to-image Reasoning and Knowledge Update mechanisms.

Online Tutorial | UC Berkeley/NVIDIA and Others Release Gsplat, an open-source 3DGS Library That Saves 4x GPU Memory and Reduces Training Time by 10%.

Command Palette

Early Bird Ticket Countdown | TVM/Triton/TileLang Show off Their Skills on the Same Stage, Meet AI Compiler Invites You to Unlock the Infinite Possibilities of AI Compilers!

Event Details

Guests and Agenda

Organizers and partners

Event Support

Related News

Event Preview | AI Computing, TileRT, Tencent, Huawei, and AI Computing Innovation Join Forces to Explore Multi-Level Collaborative Optimization

EnergAIzer, a GPU Power Estimation Framework Developed by MIT and Others, Completes Predictions in an Average of 1.8 Seconds With an Error of Approximately 81 TP3T.

4-step Image output/4K quality/6x Speedup, PiD Uses Pixel Diffusion to Unify Decoding and super-resolution Output; SA-3DAO: a Dataset Containing 1000 Pairs of Real Images Paired With Handcrafted 3D Meshes by artists.

Tencent open-sources Hy-MT1.5 Translation Model: 440MB Achieves top-tier Translation Capabilities; MIT Jointly Releases MathNet: a Multimodal Mathematical Inference Benchmark Covering 27,000 Real Olympiad Math problems.

Zero-sampling TTS Breakthrough! A Few Seconds of Reference Audio, OmniVoice Helps You Easily Clone Hundreds of Languages; 17 Languages All in One Go: MDPbench Solves the Major Problem of Parsing low-resource Text systems.

A Locally Runnable Privacy Detection Model: Privacy Filter Achieves high-quality PII Filtering at Low Cost; Hardcore Open Source! Covering the Transfermarkt Structured Football Dataset With Over 80,000 matches.

Can Emojis Control Speech Generation? Irodori-TTS Is a Japanese TTS Based on the RF-DiT Architecture; Eczema and Tinea Skin Disease Datasets: Supporting Medical Image Classification and Transfer learning.

Anima V1, a brand-new Raw Image Model, Has Been Released, Focusing on anime-style Image Generation; the MemLens Multimodal long-range Memory Evaluation Dataset Covers cross-conversation text-to-image Reasoning and Knowledge Update mechanisms.

Online Tutorial | UC Berkeley/NVIDIA and Others Release Gsplat, an open-source 3DGS Library That Saves 4x GPU Memory and Reduces Training Time by 10%.

Related News

Event Preview | AI Computing, TileRT, Tencent, Huawei, and AI Computing Innovation Join Forces to Explore Multi-Level Collaborative Optimization

EnergAIzer, a GPU Power Estimation Framework Developed by MIT and Others, Completes Predictions in an Average of 1.8 Seconds With an Error of Approximately 81 TP3T.

4-step Image output/4K quality/6x Speedup, PiD Uses Pixel Diffusion to Unify Decoding and super-resolution Output; SA-3DAO: a Dataset Containing 1000 Pairs of Real Images Paired With Handcrafted 3D Meshes by artists.

Tencent open-sources Hy-MT1.5 Translation Model: 440MB Achieves top-tier Translation Capabilities; MIT Jointly Releases MathNet: a Multimodal Mathematical Inference Benchmark Covering 27,000 Real Olympiad Math problems.

Zero-sampling TTS Breakthrough! A Few Seconds of Reference Audio, OmniVoice Helps You Easily Clone Hundreds of Languages; 17 Languages All in One Go: MDPbench Solves the Major Problem of Parsing low-resource Text systems.

A Locally Runnable Privacy Detection Model: Privacy Filter Achieves high-quality PII Filtering at Low Cost; Hardcore Open Source! Covering the Transfermarkt Structured Football Dataset With Over 80,000 matches.

Can Emojis Control Speech Generation? Irodori-TTS Is a Japanese TTS Based on the RF-DiT Architecture; Eczema and Tinea Skin Disease Datasets: Supporting Medical Image Classification and Transfer learning.

Anima V1, a brand-new Raw Image Model, Has Been Released, Focusing on anime-style Image Generation; the MemLens Multimodal long-range Memory Evaluation Dataset Covers cross-conversation text-to-image Reasoning and Knowledge Update mechanisms.

Online Tutorial | UC Berkeley/NVIDIA and Others Release Gsplat, an open-source 3DGS Library That Saves 4x GPU Memory and Reduces Training Time by 10%.

Related News

Event Preview | AI Computing, TileRT, Tencent, Huawei, and AI Computing Innovation Join Forces to Explore Multi-Level Collaborative Optimization

EnergAIzer, a GPU Power Estimation Framework Developed by MIT and Others, Completes Predictions in an Average of 1.8 Seconds With an Error of Approximately 81 TP3T.

4-step Image output/4K quality/6x Speedup, PiD Uses Pixel Diffusion to Unify Decoding and super-resolution Output; SA-3DAO: a Dataset Containing 1000 Pairs of Real Images Paired With Handcrafted 3D Meshes by artists.

Tencent open-sources Hy-MT1.5 Translation Model: 440MB Achieves top-tier Translation Capabilities; MIT Jointly Releases MathNet: a Multimodal Mathematical Inference Benchmark Covering 27,000 Real Olympiad Math problems.

Zero-sampling TTS Breakthrough! A Few Seconds of Reference Audio, OmniVoice Helps You Easily Clone Hundreds of Languages; 17 Languages All in One Go: MDPbench Solves the Major Problem of Parsing low-resource Text systems.

A Locally Runnable Privacy Detection Model: Privacy Filter Achieves high-quality PII Filtering at Low Cost; Hardcore Open Source! Covering the Transfermarkt Structured Football Dataset With Over 80,000 matches.

Can Emojis Control Speech Generation? Irodori-TTS Is a Japanese TTS Based on the RF-DiT Architecture; Eczema and Tinea Skin Disease Datasets: Supporting Medical Image Classification and Transfer learning.

Anima V1, a brand-new Raw Image Model, Has Been Released, Focusing on anime-style Image Generation; the MemLens Multimodal long-range Memory Evaluation Dataset Covers cross-conversation text-to-image Reasoning and Knowledge Update mechanisms.

Online Tutorial | UC Berkeley/NVIDIA and Others Release Gsplat, an open-source 3DGS Library That Saves 4x GPU Memory and Reduces Training Time by 10%.