Graph neural networks with configuration cross-attention for tensor compilers

With the recent popularity of neural networks comes the need for efficientserving of inference workloads. A neural network inference workload can berepresented as a computational graph with nodes as operators transformingmultidimensional tensors. The tensors can be transposed and/or tiled in acombinatorially large number of ways, some configurations leading toaccelerated inference. We propose TGraph, a neural graph architecture thatallows screening for fast configurations of the target computational graph,thus representing an artificial intelligence (AI) tensor compiler in contrastto the traditional heuristics-based compilers. The proposed solution improvesmean Kendall's $\tau$ across layout collections of TpuGraphs from 29.8% of thereliable baseline to 67.4% of TGraph. We estimate the potential CO$_2$ emissionreduction associated with our work to be equivalent to over 50% of the totalhousehold emissions in the areas hosting AI-oriented data centers.