8 months ago

Video Understanding

Semantic Segmentation

Video Processing

Computer Vision

Ho Kei Cheng Alexander G. Schwing

Abstract

We present XMem, a video object segmentation architecture for long videoswith unified feature memory stores inspired by the Atkinson-Shiffrin memorymodel. Prior work on video object segmentation typically only uses one type offeature memory. For videos longer than a minute, a single feature memory modeltightly links memory consumption and accuracy. In contrast, following theAtkinson-Shiffrin model, we develop an architecture that incorporates multipleindependent yet deeply-connected feature memory stores: a rapidly updatedsensory memory, a high-resolution working memory, and a compact thus sustainedlong-term memory. Crucially, we develop a memory potentiation algorithm thatroutinely consolidates actively used working memory elements into the long-termmemory, which avoids memory explosion and minimizes performance decay forlong-term prediction. Combined with a new memory reading mechanism, XMemgreatly exceeds state-of-the-art performance on long-video datasets while beingon par with state-of-the-art methods (that do not work on long videos) onshort-video datasets. Code is available at https://hkchengrex.github.io/XMem

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

8 months ago

Video Understanding

Semantic Segmentation

Video Processing

Computer Vision

Ho Kei Cheng Alexander G. Schwing

Abstract

We present XMem, a video object segmentation architecture for long videoswith unified feature memory stores inspired by the Atkinson-Shiffrin memorymodel. Prior work on video object segmentation typically only uses one type offeature memory. For videos longer than a minute, a single feature memory modeltightly links memory consumption and accuracy. In contrast, following theAtkinson-Shiffrin model, we develop an architecture that incorporates multipleindependent yet deeply-connected feature memory stores: a rapidly updatedsensory memory, a high-resolution working memory, and a compact thus sustainedlong-term memory. Crucially, we develop a memory potentiation algorithm thatroutinely consolidates actively used working memory elements into the long-termmemory, which avoids memory explosion and minimizes performance decay forlong-term prediction. Combined with a new memory reading mechanism, XMemgreatly exceeds state-of-the-art performance on long-video datasets while beingon par with state-of-the-art methods (that do not work on long videos) onshort-video datasets. Code is available at https://hkchengrex.github.io/XMem

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model | Papers | HyperAI