HyperAIHyperAI

Command Palette

Search for a command to run...

Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition

Ziyu Liu Hongwen Zhang Zhenghao Chen Zhiyong Wang Wanli Ouyang

Abstract

Spatial-temporal graphs have been widely used by skeleton-based actionrecognition algorithms to model human action dynamics. To capture robustmovement patterns from these graphs, long-range and multi-scale contextaggregation and spatial-temporal dependency modeling are critical aspects of apowerful feature extractor. However, existing methods have limitations inachieving (1) unbiased long-range joint relationship modeling under multi-scaleoperators and (2) unobstructed cross-spacetime information flow for capturingcomplex spatial-temporal dependencies. In this work, we present (1) a simplemethod to disentangle multi-scale graph convolutions and (2) a unifiedspatial-temporal graph convolutional operator named G3D. The proposedmulti-scale aggregation scheme disentangles the importance of nodes indifferent neighborhoods for effective long-range modeling. The proposed G3Dmodule leverages dense cross-spacetime edges as skip connections for directinformation propagation across the spatial-temporal graph. By coupling theseproposals, we develop a powerful feature extractor named MS-G3D based on whichour model outperforms previous state-of-the-art methods on three large-scaledatasets: NTU RGB+D 60, NTU RGB+D 120, and Kinetics Skeleton 400.


Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

HyperAI Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp