HyperAIHyperAI

Command Palette

Search for a command to run...

Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

Ho Kei Cheng Yu-Wing Tai Chi-Keung Tang

Abstract

We present Modular interactive VOS (MiVOS) framework which decouplesinteraction-to-mask and mask propagation, allowing for higher generalizabilityand better performance. Trained separately, the interaction module convertsuser interactions to an object mask, which is then temporally propagated by ourpropagation module using a novel top-kkk filtering strategy in reading thespace-time memory. To effectively take the user's intent into account, a noveldifference-aware module is proposed to learn how to properly fuse the masksbefore and after each interaction, which are aligned with the target frames byemploying the space-time memory. We evaluate our method both qualitatively andquantitatively with different forms of user interactions (e.g., scribbles,clicks) on DAVIS to show that our method outperforms current state-of-the-artalgorithms while requiring fewer frame interactions, with the additionaladvantage in generalizing to different types of user interactions. Wecontribute a large-scale synthetic VOS dataset with pixel-accurate segmentationof 4.8M frames to accompany our source codes to facilitate future research.


Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

HyperAI Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp