TpuGraphs: A Performance Prediction Dataset on Large Tensor Computational Graphs

Precise hardware performance models play a crucial role in codeoptimizations. They can assist compilers in making heuristic decisions or aidautotuners in identifying the optimal configuration for a given program. Forexample, the autotuner for XLA, a machine learning compiler, discovered 10-20%speedup on state-of-the-art models serving substantial production traffic atGoogle. Although there exist a few datasets for program performance prediction,they target small sub-programs such as basic blocks or kernels. This paperintroduces TpuGraphs, a performance prediction dataset on full tensor programs,represented as computational graphs, running on Tensor Processing Units (TPUs).Each graph in the dataset represents the main computation of a machine learningworkload, e.g., a training epoch or an inference step. Each data samplecontains a computational graph, a compilation configuration, and the executiontime of the graph when compiled with the configuration. The graphs in thedataset are collected from open-source machine learning programs, featuringpopular model architectures, e.g., ResNet, EfficientNet, Mask R-CNN, andTransformer. TpuGraphs provides 25x more graphs than the largest graph propertyprediction dataset (with comparable graph sizes), and 770x larger graphs onaverage compared to existing performance prediction datasets on machinelearning programs. This graph-level prediction task on large graphs introducesnew challenges in learning, ranging from scalability, training efficiency, tomodel quality.