Fast User-Guided Video Object Segmentation by Interaction-and-Propagation Networks

We present a deep learning method for the interactive video objectsegmentation. Our method is built upon two core operations, interaction andpropagation, and each operation is conducted by Convolutional Neural Networks.The two networks are connected both internally and externally so that thenetworks are trained jointly and interact with each other to solve the complexvideo object segmentation problem. We propose a new multi-round training schemefor the interactive video object segmentation so that the networks can learnhow to understand the user's intention and update incorrect estimations duringthe training. At the testing time, our method produces high-quality results andalso runs fast enough to work with users interactively. We evaluated theproposed method quantitatively on the interactive track benchmark at the DAVISChallenge 2018. We outperformed other competing methods by a significant marginin both the speed and the accuracy. We also demonstrated that our method workswell with real user interactions.