Compile and Optimize the Model Using TVMC

Contents at a glance:This section explains how to use TVMC to compile and optimize the model. TVMC is the command driver of TVM, which executes TVM functions through command lines. This section is the basis for understanding how TVM works.
Keywords:TVMC TVM Machine Learning
This section introduces TVMC, the command-line driver for TVM. TVMC executes TVM functions through the command line interface(Including automatic tuning, compilation, analysis and execution of models).
After finishing this section,The following tasks can be accomplished using TVMC:
- Compile the pre-trained ResNet-50 v2 model for TVM runtime.
- Use the compiled model to predict real images and explain the output and model performance.
- Use TVM to tune the model on CPU.
- Recompile the optimized model using the tuning data collected by TVM.
- Predict images using the optimized models and compare the output and model performance.
This section provides an overview of the functions of TVM and TVMC and lays the foundation for understanding how TVM works.
Using TVMC
TVMC is a Python application and part of the TVM Python package.When you install TVM using the Python package, you will get a tvmc The location of this command varies depending on the platform and installation method.
In addition, if $PYTHONPATH If there is a Python module called TVM on the , you can use the executable Python module (with python -m tvm.driver.tvmc command) to access the command-line driver functionality.
This tutorial uses tvmc or python -m tvm.driver.tvmc to open the TVMC command line.
Use the following command to view the help page:
tvmc --help
tvmc The main functions of TVM are available from subcommands compile , run and tune .use tvmc –help View the specific options for a given subcommand.
This tutorial will introduce these commands. Before you begin, please download a pre-trained model.
Get the model
In this tutorial, we will use ResNet-50 v2. ResNet-50 is a 50-layer deep convolutional neural network for image classification.The model we will use next has been pre-trained on more than 1 million images with 1000 different categories. The input image size of this network is 224×224.
I recommend downloading Netron (a free ML model viewer) to explore the organizational structure of the ResNet-50 model in more depth.
Download Netron: https://netron.app/
This tutorial uses the model in ONNX format:
wget https://github.com/onnx/models/raw/b9a54e89508f101a1611cd64f4ef56b9cb62c7cf/vision/classification/resnet/model/resnet50-v2-7.onnx
Tips 1 Supported model formats:
TVMC supports models created with Keras, ONNX, TensorFlow, TFLite, and Torch. Available --model-format The option specifies the model format being used. tvmc compile –help for more information.
Tips 2 Add support for ONNX to TVM:
TVM depends on having the ONNX Python library available on your system. Install ONNX with command pip3 install –user onnx onnxoptimizer . If you have root access and want to install ONNX globally, you can remove the –user option. The onnxoptimizer dependency is optional and is only used for onnx>=1.9 .
Compile ONNX model to TVM Runtime
After downloading the ResNet-50 model, use tvmc compile Compile it. The output of the compilation is a TAR package of the model (compiled into a dynamic library for the target platform). The model can be run on the target device using the TVM runtime:
# 大概需要几分钟,取决于设备
tvmc compile \
--target "llvm" \
--input-shapes "data:[1,3,224,224]" \
--output resnet50-v2-7-tvm.tar \
resnet50-v2-7.onnx
Check tvmc compile Files created in the module:
mkdir model
tar -xvf resnet50-v2-7-tvm.tar -C model
ls model
There are three files after decompression:
* mod.so It is a model that can be loaded by TVM runtime, represented as a C++ library.
* mod.json It is a textual representation of the TVM Relay computation graph.
* mod.params is a file containing the pre-trained model parameters.
Modules can be loaded directly by applications, while models can be run through the TVM runtime API.
Tips 3: Define the correct TARGET:
Specify the correct target (option --target ) can greatly improve the performance of compiled modules because they can take advantage of hardware features available on the target. See Automatically Tuning Convolutional Networks for x86 CPUs for more information. It is recommended to determine the CPU model and optional features used, and then set the target appropriately.
Run models from compiled modules using TVMC
After compiling the model into a module, you can use TVM runtime to make predictions on it. TVMC has a built-in TVM runtime, allowing to run compiled TVM models.
To run a model and make predictions using TVMC, you need:
- The compiled module just generated.
- Valid inputs to the model used for prediction.
The shape, format, and data type of tensors vary between models.Therefore, most models require pre-processing and post-processing to ensure that the input is valid and the output can be interpreted. TVMC uses NumPy .npz Format input and output, which well supports serializing multiple arrays into a single file.
The image input used in this tutorial is an image of a cat, but you can choose other images of your choice.

Input preprocessing
The input to the ResNet-50 v2 model should be in ImageNet format. Below is an example script for preprocessing images for ResNet-50 v2.
First use pip3 install –user pillow Download the Python Imaging Library to satisfy the script's dependency on the imaging library.
#!python ./preprocess.py
from tvm.contrib.download import download_testdata
from PIL import Image
import numpy as np
img_url = "https://s3.amazonaws.com/model-server/inputs/kitten.jpg"
img_path = download_testdata(img_url, "imagenet_cat.png", module="data")
# 重设大小为 224x224
resized_image = Image.open(img_path).resize((224, 224))
img_data = np.asarray(resized_image).astype("float32")
# ONNX 需要 NCHW 输入, 因此对数组进行转换
img_data = np.transpose(img_data, (2, 0, 1))
# 根据 ImageNet 进行标准化
imagenet_mean = np.array([0.485, 0.456, 0.406])
imagenet_stddev = np.array([0.229, 0.224, 0.225])
norm_img_data = np.zeros(img_data.shape).astype("float32")
for i in range(img_data.shape[0]):
norm_img_data[i, :, :] = (img_data[i, :, :] / 255 - imagenet_mean[i]) / imagenet_stddev[i]
# 添加 batch 维度
img_data = np.expand_dims(norm_img_data, axis=0)
# 保存为 .npz(输出 imagenet_cat.npz)
np.savez("imagenet_cat", data=img_data)
Run the compiled module
With the model and input data, let's run TVMC to make predictions:
tvmc run \
--inputs imagenet_cat.npz \
--output predictions.npz \
resnet50-v2-7-tvm.tar
.tar The model file includes a C++ library, a description file for the Relay model, and a parameter file for the model. TVMC includes the TVM runtime (which can load the model and make predictions on the input). Running the above command, TVMC will output a new file predictions.npz, which contains the model output tensors in NumPy format.
In this example, the model is compiled on the same machine as the one on which it is run. In some cases, it may be possible to run it remotely using RPC Tracker. tvmc run –help to learn more about these options.
Output post-processing
As mentioned before, each model provides output tensors differently.
In this example, we need to run some post-processing to make the output of ResNet-50 v2 in a more readable form using the lookup table provided for this model.
The following script is a post-processing example that extracts labels from the output of a compiled module:
#!python ./postprocess.py
import os.path
import numpy as np
from scipy.special import softmax
from tvm.contrib.download import download_testdata
# 下载标签列表
labels_url = "https://s3.amazonaws.com/onnx-model-zoo/synset.txt"
labels_path = download_testdata(labels_url, "synset.txt", module="data")
with open(labels_path, "r") as f:
labels = [l.rstrip() for l in f]
output_file = "predictions.npz"
# 打开并读入输出张量
if os.path.exists(output_file):
with np.load(output_file) as data:
scores = softmax(data["output_0"])
scores = np.squeeze(scores)
ranks = np.argsort(scores)[::-1]
for rank in ranks[0:5]:
print("class='%s' with probability=%f" % (labels[rank], scores[rank]))
The output of running this script is as follows:
python postprocess.py
# class='n02123045 tabby, tabby cat' with probability=0.610553
# class='n02123159 tiger cat' with probability=0.367179
# class='n02124075 Egyptian cat' with probability=0.019365
# class='n02129604 tiger, Panthera tigris' with probability=0.001273
# class='n04040759 radiator' with probability=0.000261
Replace the above cat image with other images and see what predictions the ResNet model makes.
Automatically tune the ResNet model
Previous models were compiled to run on the TVM runtime and therefore did not contain platform-specific optimizations.This section will introduce how to use TVMC to build an optimization model for the working platform.
When reasoning with compiled modules, you may not always get the expected performance.In this case, the auto-tuner can be used to better configure the model and thus improve performance.Fine-tuning in TVM means optimizing the model on a given target to make it run faster. Unlike training or fine-tuning, it does not affect the accuracy of the model, but only the runtime performance.
As part of the tuning process,TVM implements and runs many variations of different operators to see which performs best.The results of these runs are stored in the tuning log file (the final output of the tune command).
Tuning should at least include:
- Platform requirements for target devices running this model
- Path to the output file where the tuning records are stored
- The path to the model to tune.
The following example demonstrates its workflow:
# 默认搜索算法需要 xgboost,有关调优搜索算法的详细信息,参见下文
pip install xgboost
tvmc tune \
--target "llvm" \
--output resnet50-v2-7-autotuner_records.json \
resnet50-v2-7.onnx
In this case, --target You will get better results when you specify a more specific target with the flag. For example, on an Intel i7 processor, you can use –target llvm -mcpu=skylake This tuning example uses LLVM as the compiler for a specific architecture, and performs native tuning on the CPU.
TVMC searches the parameter space of the model, trying different configurations for the operators, and then chooses the one that runs fastest on the platform. Although this is a guided search based on CPU and model operations, it can still take several hours to complete. The output of the search is saved to resnet50-v2-7-autotuner_records.json file, which is then used to compile the optimized model.
Tips 4 Define and tune the search algorithm:
This search algorithm uses the default XGBoost Grid The algorithm is used for guidance. Depending on the model complexity and available time, different algorithms can be selected. A complete list can be found in tvmc tune –help .
For a consumer Skylake CPU, the output is:

Compile and optimize the model using tuning data
From the output file of the above tuning process `resnet50-v2-7-autotuner_records.json You can obtain tuning records.
This file can be used to:
- As input for further tuning (via tvmc tune –tuning-records )
- As input to the compiler
implement tvmc compile –tuning-records The command lets the compiler use this result to generate high-performance code for the model on the specified target. tvmc compile –help for more information.
After the model tuning data is collected, the model can be recompiled with optimized operators to speed up the calculation.
tvmc compile \
--target "llvm" \
--tuning-records resnet50-v2-7-autotuner_records.json \
--output resnet50-v2-7-tvm_autotuned.tar \
resnet50-v2-7.onnx
Verify that the optimized model runs and produces the same results:
tvmc run \
--inputs imagenet_cat.npz \
--output predictions.npz \
resnet50-v2-7-tvm_autotuned.tar
python postprocess.py
Verify that the predicted values are the same:
# class='n02123045 tabby, tabby cat' with probability=0.610550
# class='n02123159 tiger cat' with probability=0.367181
# class='n02124075 Egyptian cat' with probability=0.019365
# class='n02129604 tiger, Panthera tigris' with probability=0.001273
# class='n04040759 radiator' with probability=0.000261
Comparing tuned and untuned models
TVMC provides basic performance evaluation tools between models.You can specify the number of repetitions and also specify the runtime for TVMC to report the model (independent of the runtime startup). This gives you a rough idea of how much the tuning has improved the model's performance.
For example, when tested on an Intel i7 system, the tuned model runs 47% faster than the untuned model:
tvmc run \
--inputs imagenet_cat.npz \
--output predictions.npz \
--print-time \
--repeat 100 \
resnet50-v2-7-tvm_autotuned.tar
# Execution time summary:
# mean (ms) max (ms) min (ms) std (ms)
# 92.19 115.73 89.85 3.15
tvmc run \
--inputs imagenet_cat.npz \
--output predictions.npz \
--print-time \
--repeat 100 \
resnet50-v2-7-tvm.tar
# Execution time summary:
# mean (ms) max (ms) min (ms) std (ms)
# 193.32 219.97 185.04 7.11
Final Thoughts
This tutorial introduces TVMC, the command-line driver for TVM.It demonstrates how to compile, run, and tune models, and discusses the need for pre- and post-processing of inputs and outputs.After tuning, demonstrate how to compare the performance of the unoptimized and optimized models.
This document shows a simple example of using ResNet-50 v2 locally. However, TVMC supports more features including cross-compilation, remote execution, and profiling/benchmarking.
use tvmc –help command to see other available options.
The next tutorial, Compiling and Optimizing a Model with the Python Interface, will introduce the same compilation and optimization steps using the Python interface.
Keep following, don't miss it~