HyperAI

How to Optimize GEMM on CPU

This tutorial will demonstrate how to use TVM to optimize matrix multiplication and achieve performance 200 times faster than the baseline with 18 lines of code.