FODVid: Flow-guided Object Discovery in Videos

Segmentation of objects in a video is challenging due to the nuances such asmotion blurring, parallax, occlusions, changes in illumination, etc. Instead ofaddressing these nuances separately, we focus on building a generalizablesolution that avoids overfitting to the individual intricacies. Such a solutionwould also help us save enormous resources involved in human annotation ofvideo corpora. To solve Video Object Segmentation (VOS) in an unsupervisedsetting, we propose a new pipeline (FODVid) based on the idea of guidingsegmentation outputs using flow-guided graph-cut and temporal consistency.Basically, we design a segmentation model incorporating intra-frame appearanceand flow similarities, and inter-frame temporal continuation of the objectsunder consideration. We perform an extensive experimental analysis of ourstraightforward methodology on the standard DAVIS16 video benchmark. Thoughsimple, our approach produces results comparable (within a range of ~2 mIoU) tothe existing top approaches in unsupervised VOS. The simplicity andeffectiveness of our technique opens up new avenues for research in the videodomain.