4 months ago

Action Recognition

Multimodal Representation

Computer Vision

Shahroudy Amir Ng Tian-Tsong Gong Yihong Wang Gang

Abstract

Single modality action recognition on RGB or depth sequences has beenextensively explored recently. It is generally accepted that each of these twomodalities has different strengths and limitations for the task of actionrecognition. Therefore, analysis of the RGB+D videos can help us to betterstudy the complementary properties of these two types of modalities and achievehigher levels of performance. In this paper, we propose a new deep autoencoderbased shared-specific feature factorization network to separate inputmultimodal signals into a hierarchy of components. Further, based on thestructure of the features, a structured sparsity learning machine is proposedwhich utilizes mixed norms to apply regularization within components and groupselection between them for better classification performance. Our experimentalresults show the effectiveness of our cross-modality feature analysis frameworkby achieving state-of-the-art accuracy for action classification on fivechallenging benchmark datasets.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

4 months ago

Action Recognition

Multimodal Representation

Computer Vision

Shahroudy Amir Ng Tian-Tsong Gong Yihong Wang Gang

Abstract

Single modality action recognition on RGB or depth sequences has beenextensively explored recently. It is generally accepted that each of these twomodalities has different strengths and limitations for the task of actionrecognition. Therefore, analysis of the RGB+D videos can help us to betterstudy the complementary properties of these two types of modalities and achievehigher levels of performance. In this paper, we propose a new deep autoencoderbased shared-specific feature factorization network to separate inputmultimodal signals into a hierarchy of components. Further, based on thestructure of the features, a structured sparsity learning machine is proposedwhich utilizes mixed norms to apply regularization within components and groupselection between them for better classification performance. Our experimentalresults show the effectiveness of our cross-modality feature analysis frameworkby achieving state-of-the-art accuracy for action classification on fivechallenging benchmark datasets.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp