Technical Salon | Shanghai Innovation Lab, TileAI, Huawei, Advanced Compiler Lab, and AI9Stars Gather in Shanghai for In-Depth Analysis of the Entire Chain of Operator Optimization Practices

2 months ago

The 8th Meet AI Compiler will be held in Shanghai on December 27th!In this issue, we invited several experts from Shanghai Innovation Academy, TileAI Community, Huawei HiSilicon, Advanced Compiler Lab, and AI9Stars Community to share their insights across the entire technology chain, from software stack design and operator development to performance optimization. The content covers cross-ecosystem interoperability of TVM, fusion operator optimization of PyPTO, low-latency systems of TileRT, key optimization technologies of Triton for multiple architectures, and operator optimization of AutoTriton, presenting a complete technical path from theory to implementation.

Registration is now open, seats are limited! Come join us for valuable insights – we'll be waiting for you in Shanghai! 🫶

Event Details

⏰ Time: December 27th (Saturday) 13:30-17:30

📍 Location: Shanghai Innovation Academy, No. 3, Lane 699, Huafa Road, Xuhui District, Shanghai

👬 Number of participants: 150 (Limited seating available, please register as soon as possible)

🙌🏻 Register:https://hdxu.cn/1CupU

The full agenda is as follows ⬇️

Guests and topics of discussion

Sharing guests

13:40-17:20

Share topic:TVM FFI: Open ABI and FFI for Machine Learning Systems

Contents:TVM FFI aims to solve the problems of fragmented ecosystems and interoperability in machine learning systems. By defining open ABI and FFI standards, the project utilizes stable CABI and DLPack to achieve zero-copy data transfer, bridging the gap between frameworks like PyTorch and underlying compilers. It supports efficient cross-language calls, significantly reducing the engineering costs of multi-platform adaptation.

Watch this sharing session and you will learn:

1. Learn the TVM-FFI universal standard to significantly reduce the development and maintenance costs of cross-language MLsys.

2. Understand and build a future-compatible modular ML ecosystem

Share topic:TileRT: A Software and Hardware Exploration for Low-Latency Large Model Inference

Contents:As large models reach trillions of parameters and process sequences exceeding millions of tokens, their capabilities are constantly breaking records. However, the pursuit of ultimate computational speed for models has never ceased. On one hand, many low-latency scenarios require responses within seconds or even milliseconds, such as real-time decision-making and game theory. On the other hand, with the advent of the Agent era in large model training, the rollout time for extremely long sequences has become a major bottleneck.

This report introduces the TileRT project, exploring how to build a software stack for large-scale model computation with extremely low latency, from the perspectives of AI compilers, runtime, and architecture design.

Watch this sharing session and you will learn:

1. Understand the background, importance, and future prospects of low-latency inference scenarios for large models.

2. TileRT's Technical Challenges and Practical Experience Sharing

Share topic:PyPTO: A framework for developing fusion operators based on white-box compilation.

Contents:This presentation focuses on Huawei's newly launched converged operator development framework, PyPTO. Based on the Tensor/Tile programming paradigm, it achieves a balance between high performance and ease of use by focusing on technologies such as in-core SRAM management, cross-platform PTO instruction sets, and MPMD runtime, combined with Human-In-The-Loop tuning and white-box compilation.

Watch this sharing session and you will learn:

1. Master the design philosophy and core architecture of PyPTO, a fusion operator development framework natively designed for SIMD architecture.

2. Master PyPTO's white-box compilation philosophy, which focuses on leveraging users' expert experience, and the essence of Human-In-The-Loop optimization.

3. Master the complete process of quickly developing high-performance fusion operators on the Ascend platform using the visualization tools provided by PyPTO.

Share topic:Compilation optimization practices for the Triton compiler

Contents:This presentation focuses on optimization practices for the Triton compiler, systematically introducing Triton's language and compiler structure, ecosystem evolution, and operator library development methods. It also delves into key optimization techniques for multiple architectures, including CPU, NPU, and GPU, demonstrating the complete path to building a high-performance unified operator system.

Watch this sharing session and you will learn:

1. Latest developments in the Triton ecosystem

2. Key optimization techniques of the Triton compiler on multiple architectures (CPU/NPU/GPU)

Share topic:AutoTriton: Exploring Triton Operator Optimization Techniques for Large Models Driven by Reinforcement Learning

Contents:Writing efficient kernels using languages like CUDA is the domain of performance engineers. With the advent of programming frameworks like Triton, kernel programmability has seen a significant leap. However, developers still need to manually configure key parameters, limiting performance portability and widespread application. This report will introduce our exploration of benchmarks and models for large-scale operator generation and discuss the enormous potential of large models in operator optimization.

Watch this sharing session and you will learn:

1. Relevant work and latest progress in large model empowerment operator optimization

2. Key Technologies of Large Models in Operator Optimization

Organizers and partners

HyperAI (hyper.ai) is an internationally leading artificial intelligence and high-performance computing community.It aims to help developers and enthusiasts in the global data science and artificial intelligence industry learn, understand and practice by providing a series of services such as industry information reports, accelerated data set downloads, online tutorial demonstrations, popular model performance evaluations, cutting-edge paper recommendations, high-value results interpretations, and top conference calendar integration, and build the future of artificial intelligence together with the community.

Visit the official website:https://hyper.ai/

OpenBayes Bayesian Computing is a leading high-performance computing service provider in ChinaBy grafting classic software ecosystems and machine learning models onto new-generation heterogeneous chips, it provides industrial enterprises and university scientific research with faster and easier-to-use data science computing products. Its products have been adopted by dozens of large industrial scenarios or leading scientific research institutes.

Visit the official website:https://openbayes.com/

The MLC.AI community was established in June 2022. Chen Tianqi, the main inventor of Apache TVM and a well-known young scholar in the field of machine learning, led the team to launch the MLC online course, which systematically introduced the key elements and core concepts of machine learning compilation.

In November 2022, with the joint efforts of MLC.AI community volunteers, the first complete TVM Chinese documentation was launched and successfully hosted on the HyperAI official website, further providing domestic developers interested in machine learning compilation with the basic settings for accessing and learning a new technology - documentation.

MLC Online Courses:https://mlc.ai/

TVM Chinese Documentation:https://tvm.hyper.ai/

Shanghai Innovation Academy is a new type of talent training institution jointly built by top universities, leading enterprises, and research institutions. Adhering to the training philosophy of "student-centered and cutting-edge research," the academy explores a uniquely Chinese AI leadership talent training program through exceptional faculty, extraordinary training measures, and outstanding support conditions. It is committed to cultivating leading AI talents in China and building a world-class innovation hub for artificial intelligence.

Event Support

Given the limited space at the venue, we have only opened 150 seats available. We recommend that you register as soon as possible to secure your place.

See you there on December 27th from 13:30 to 17:30!

Technical Salon | Shanghai Innovation Lab, TileAI, Huawei, Advanced Compiler Lab, and AI9Stars Gather in Shanghai for In-Depth Analysis of the Entire Chain of Operator Optimization Practices

2 months ago

Information

AI Compiler

TVM

Artificial Intelligence

Machine Learning

Registration is now open, seats are limited! Come join us for valuable insights – we'll be waiting for you in Shanghai! 🫶

Event Details

⏰ Time: December 27th (Saturday) 13:30-17:30

📍 Location: Shanghai Innovation Academy, No. 3, Lane 699, Huafa Road, Xuhui District, Shanghai

👬 Number of participants: 150 (Limited seating available, please register as soon as possible)

🙌🏻 Register:https://hdxu.cn/1CupU

The full agenda is as follows ⬇️

Guests and topics of discussion

Sharing guests

13:40-17:20

Share topic:TVM FFI: Open ABI and FFI for Machine Learning Systems

Watch this sharing session and you will learn:

1. Learn the TVM-FFI universal standard to significantly reduce the development and maintenance costs of cross-language MLsys.

2. Understand and build a future-compatible modular ML ecosystem

Share topic:TileRT: A Software and Hardware Exploration for Low-Latency Large Model Inference

Watch this sharing session and you will learn:

1. Understand the background, importance, and future prospects of low-latency inference scenarios for large models.

2. TileRT's Technical Challenges and Practical Experience Sharing

Share topic:PyPTO: A framework for developing fusion operators based on white-box compilation.

Watch this sharing session and you will learn:

1. Master the design philosophy and core architecture of PyPTO, a fusion operator development framework natively designed for SIMD architecture.

2. Master PyPTO's white-box compilation philosophy, which focuses on leveraging users' expert experience, and the essence of Human-In-The-Loop optimization.

3. Master the complete process of quickly developing high-performance fusion operators on the Ascend platform using the visualization tools provided by PyPTO.

Share topic:Compilation optimization practices for the Triton compiler

Watch this sharing session and you will learn:

1. Latest developments in the Triton ecosystem

2. Key optimization techniques of the Triton compiler on multiple architectures (CPU/NPU/GPU)

Share topic:AutoTriton: Exploring Triton Operator Optimization Techniques for Large Models Driven by Reinforcement Learning

Watch this sharing session and you will learn:

1. Relevant work and latest progress in large model empowerment operator optimization

2. Key Technologies of Large Models in Operator Optimization

Organizers and partners

Visit the official website:https://hyper.ai/

Visit the official website:https://openbayes.com/

MLC Online Courses:https://mlc.ai/

TVM Chinese Documentation:https://tvm.hyper.ai/

Event Support

Given the limited space at the venue, we have only opened 150 seats available. We recommend that you register as soon as possible to secure your place.

See you there on December 27th from 13:30 to 17:30!

Command Palette

Technical Salon | Shanghai Innovation Lab, TileAI, Huawei, Advanced Compiler Lab, and AI9Stars Gather in Shanghai for In-Depth Analysis of the Entire Chain of Operator Optimization Practices

Guests and topics of discussion

Organizers and partners

Event Support

Command Palette

Technical Salon | Shanghai Innovation Lab, TileAI, Huawei, Advanced Compiler Lab, and AI9Stars Gather in Shanghai for In-Depth Analysis of the Entire Chain of Operator Optimization Practices

Guests and topics of discussion

Organizers and partners

Event Support

Related News

Full Agenda | Shanghai Innovation Center, TileAI, Huawei, Advanced Compiler Lab, and AI9Stars Gather in Shanghai for an in-depth Analysis of the Entire Process of Operator optimization.

Full Replay | Shanghai Chuangzhi/TileAI/Huawei/Advanced Compiler Lab/AI9Stars: In-depth Analysis of AI Compiler Technology Practice

Starting Tomorrow! Shanghai Innovation Lab, TileAI, Huawei, Advanced Compiler Lab, and AI9Stars Will Gather in Shanghai for a Pure, hands-on Sharing of AI Compiler Practical experience.

CUDA's Initial Team Members Sharply Criticized cuTile for "specifically Targeting" Triton; Can the Tile Paradigm Reshape the Competitive Landscape of the GPU Programming Ecosystem?

Open Source, Best Value! Mistral AI Releases the Ministral 3 Series of Models, Integrating Multimodal Understanding and Intelligent Execution Capabilities; From high-dynamic Dance to Everyday Behavior, the X-Dance Dataset Unlocks multi-dimensional Testing for Human Animation generation.

A low-barrier Trial of Open-AutoGLM: an Intelligent Agent Experience Combining Screen Understanding and Automated Execution; Spatial-SSRL-81k: Building a self-supervised Improvement Path for Spatial awareness.

FLUX.2-klein-4B: Achieves 4-step sub-second Image Generation via Distillation, Enabling real-time Interaction on consumer-grade GPUs; Vehicles OpenImages Dataset: Focuses on Vehicle Detection and localization.

Breakthrough in 3D Vision: ByteSeed Launches DA3, Enabling Visual Space Reconstruction From Any Viewpoint; 70,000+ real-world Industrial Environment Data! CHIP Fills the Gap in Industrial Data for 6D Pose estimation.

Innovative Input/Output Technology! Tencent Hunyuan Launches HunyuanWorld-Mirror, Refreshing 3D Reconstruction to State-of-the-Art; Decoding the Full Picture of Netflix Content! Netflix Movie and TV Catalog Dataset Helps Insights Into Entertainment Trends

Command Palette

Technical Salon | Shanghai Innovation Lab, TileAI, Huawei, Advanced Compiler Lab, and AI9Stars Gather in Shanghai for In-Depth Analysis of the Entire Chain of Operator Optimization Practices

Guests and topics of discussion

Organizers and partners

Event Support

Related News

Full Agenda | Shanghai Innovation Center, TileAI, Huawei, Advanced Compiler Lab, and AI9Stars Gather in Shanghai for an in-depth Analysis of the Entire Process of Operator optimization.

Full Replay | Shanghai Chuangzhi/TileAI/Huawei/Advanced Compiler Lab/AI9Stars: In-depth Analysis of AI Compiler Technology Practice

Starting Tomorrow! Shanghai Innovation Lab, TileAI, Huawei, Advanced Compiler Lab, and AI9Stars Will Gather in Shanghai for a Pure, hands-on Sharing of AI Compiler Practical experience.

CUDA's Initial Team Members Sharply Criticized cuTile for "specifically Targeting" Triton; Can the Tile Paradigm Reshape the Competitive Landscape of the GPU Programming Ecosystem?

Open Source, Best Value! Mistral AI Releases the Ministral 3 Series of Models, Integrating Multimodal Understanding and Intelligent Execution Capabilities; From high-dynamic Dance to Everyday Behavior, the X-Dance Dataset Unlocks multi-dimensional Testing for Human Animation generation.

A low-barrier Trial of Open-AutoGLM: an Intelligent Agent Experience Combining Screen Understanding and Automated Execution; Spatial-SSRL-81k: Building a self-supervised Improvement Path for Spatial awareness.

FLUX.2-klein-4B: Achieves 4-step sub-second Image Generation via Distillation, Enabling real-time Interaction on consumer-grade GPUs; Vehicles OpenImages Dataset: Focuses on Vehicle Detection and localization.

Breakthrough in 3D Vision: ByteSeed Launches DA3, Enabling Visual Space Reconstruction From Any Viewpoint; 70,000+ real-world Industrial Environment Data! CHIP Fills the Gap in Industrial Data for 6D Pose estimation.

Innovative Input/Output Technology! Tencent Hunyuan Launches HunyuanWorld-Mirror, Refreshing 3D Reconstruction to State-of-the-Art; Decoding the Full Picture of Netflix Content! Netflix Movie and TV Catalog Dataset Helps Insights Into Entertainment Trends

Related News

Full Agenda | Shanghai Innovation Center, TileAI, Huawei, Advanced Compiler Lab, and AI9Stars Gather in Shanghai for an in-depth Analysis of the Entire Process of Operator optimization.

Full Replay | Shanghai Chuangzhi/TileAI/Huawei/Advanced Compiler Lab/AI9Stars: In-depth Analysis of AI Compiler Technology Practice

Starting Tomorrow! Shanghai Innovation Lab, TileAI, Huawei, Advanced Compiler Lab, and AI9Stars Will Gather in Shanghai for a Pure, hands-on Sharing of AI Compiler Practical experience.

CUDA's Initial Team Members Sharply Criticized cuTile for "specifically Targeting" Triton; Can the Tile Paradigm Reshape the Competitive Landscape of the GPU Programming Ecosystem?

Open Source, Best Value! Mistral AI Releases the Ministral 3 Series of Models, Integrating Multimodal Understanding and Intelligent Execution Capabilities; From high-dynamic Dance to Everyday Behavior, the X-Dance Dataset Unlocks multi-dimensional Testing for Human Animation generation.

A low-barrier Trial of Open-AutoGLM: an Intelligent Agent Experience Combining Screen Understanding and Automated Execution; Spatial-SSRL-81k: Building a self-supervised Improvement Path for Spatial awareness.

FLUX.2-klein-4B: Achieves 4-step sub-second Image Generation via Distillation, Enabling real-time Interaction on consumer-grade GPUs; Vehicles OpenImages Dataset: Focuses on Vehicle Detection and localization.

Breakthrough in 3D Vision: ByteSeed Launches DA3, Enabling Visual Space Reconstruction From Any Viewpoint; 70,000+ real-world Industrial Environment Data! CHIP Fills the Gap in Industrial Data for 6D Pose estimation.

Innovative Input/Output Technology! Tencent Hunyuan Launches HunyuanWorld-Mirror, Refreshing 3D Reconstruction to State-of-the-Art; Decoding the Full Picture of Netflix Content! Netflix Movie and TV Catalog Dataset Helps Insights Into Entertainment Trends

Related News

Full Agenda | Shanghai Innovation Center, TileAI, Huawei, Advanced Compiler Lab, and AI9Stars Gather in Shanghai for an in-depth Analysis of the Entire Process of Operator optimization.

Full Replay | Shanghai Chuangzhi/TileAI/Huawei/Advanced Compiler Lab/AI9Stars: In-depth Analysis of AI Compiler Technology Practice

Starting Tomorrow! Shanghai Innovation Lab, TileAI, Huawei, Advanced Compiler Lab, and AI9Stars Will Gather in Shanghai for a Pure, hands-on Sharing of AI Compiler Practical experience.

CUDA's Initial Team Members Sharply Criticized cuTile for "specifically Targeting" Triton; Can the Tile Paradigm Reshape the Competitive Landscape of the GPU Programming Ecosystem?

Open Source, Best Value! Mistral AI Releases the Ministral 3 Series of Models, Integrating Multimodal Understanding and Intelligent Execution Capabilities; From high-dynamic Dance to Everyday Behavior, the X-Dance Dataset Unlocks multi-dimensional Testing for Human Animation generation.

A low-barrier Trial of Open-AutoGLM: an Intelligent Agent Experience Combining Screen Understanding and Automated Execution; Spatial-SSRL-81k: Building a self-supervised Improvement Path for Spatial awareness.

FLUX.2-klein-4B: Achieves 4-step sub-second Image Generation via Distillation, Enabling real-time Interaction on consumer-grade GPUs; Vehicles OpenImages Dataset: Focuses on Vehicle Detection and localization.

Breakthrough in 3D Vision: ByteSeed Launches DA3, Enabling Visual Space Reconstruction From Any Viewpoint; 70,000+ real-world Industrial Environment Data! CHIP Fills the Gap in Industrial Data for 6D Pose estimation.

Innovative Input/Output Technology! Tencent Hunyuan Launches HunyuanWorld-Mirror, Refreshing 3D Reconstruction to State-of-the-Art; Decoding the Full Picture of Netflix Content! Netflix Movie and TV Catalog Dataset Helps Insights Into Entertainment Trends