Proposal, Tracking and Segmentation (PTS): A Cascaded Network for Video Object Segmentation
Video object segmentation (VOS) aims at pixel-level object tracking givenonly the annotations in the first frame. Due to the large visual variations ofobjects in video and the lack of training samples, it remains a difficult taskdespite the upsurging development of deep learning. Toward solving the VOSproblem, we bring in several new insights by the proposed unified frameworkconsisting of object proposal, tracking and segmentation components. The objectproposal network transfers objectness information as generic knowledge intoVOS; the tracking network identifies the target object from the proposals; andthe segmentation network is performed based on the tracking results with anovel dynamic-reference based model adaptation scheme. Extensive experimentshave been conducted on the DAVIS'17 dataset and the YouTube-VOS dataset, ourmethod achieves the state-of-the-art performance on several video objectsegmentation benchmarks. We make the code publicly available athttps://github.com/sydney0zq/PTSNet.