Command Palette
Search for a command to run...
XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin
Memory Model
XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model
Ho Kei Cheng Alexander G. Schwing
Abstract
We present XMem, a video object segmentation architecture for long videoswith unified feature memory stores inspired by the Atkinson-Shiffrin memorymodel. Prior work on video object segmentation typically only uses one type offeature memory. For videos longer than a minute, a single feature memory modeltightly links memory consumption and accuracy. In contrast, following theAtkinson-Shiffrin model, we develop an architecture that incorporates multipleindependent yet deeply-connected feature memory stores: a rapidly updatedsensory memory, a high-resolution working memory, and a compact thus sustainedlong-term memory. Crucially, we develop a memory potentiation algorithm thatroutinely consolidates actively used working memory elements into the long-termmemory, which avoids memory explosion and minimizes performance decay forlong-term prediction. Combined with a new memory reading mechanism, XMemgreatly exceeds state-of-the-art performance on long-video datasets while beingon par with state-of-the-art methods (that do not work on long videos) onshort-video datasets. Code is available at https://hkchengrex.github.io/XMem