Home News Papers Tutorials Datasets Wiki SOTA LLM Models GPU Leaderboard Events

English

Action Recognition In Videos On Ntu Rgbd 120

Metrics

Accuracy (Cross-Setup)

Accuracy (Cross-Subject)

Results

Performance results of various models on this benchmark

Model Name	Accuracy (Cross-Setup)	Accuracy (Cross-Subject)	Paper Title	Repository
DSCNet (RGB + Pose)	96.7	95.6	A Dense-Sparse Complementary Network for Human Action Recognition based on RGB and Skeleton Modalities	-
Body Pose Evolution Map	64.6	66.9	Recognizing Human Actions as the Evolution of Pose Estimation Maps	-
3DA (RGB + Pose)	91.4	90.5	Cross-Modal Learning with 3D Deformable Attention for Action Recognition	-
Gimme Signals (AIS)	70.8	71.59	Gimme Signals: Discriminative signal encoding for multimodal activity recognition
Skelemotion + Yang et al. (skeleton only)	66.9	67.7	SkeleMotion: A New Representation of Skeleton Joint Sequences Based on Motion Information for 3D Action Recognition
π-ViT (RGB only)	91.9	92.9	Just Add $\pi$! Pose Induced Video Transformers for Understanding Activities of Daily Living
VPN++ (RGB + Pose)	90.7	92.5	VPN++: Rethinking Video-Pose embeddings for understanding Activities of Daily Living
π-ViT (RGB + Pose)	96.1	95.1	Just Add $\pi$! Pose Induced Video Transformers for Understanding Activities of Daily Living
DVANet (RGB only)	90.4	91.6	DVANet: Disentangling View and Action Features for Multi-View Action Recognition
TSRJI	67.9	62.8	Skeleton Image Representation for 3D Action Recognition based on Tree Structure and Reference Joints
ST-GCN + AS-GCN w/DH-TCN	78.3	79.2	Vertex Feature Encoding and Hierarchical Temporal Modeling in a Spatial-Temporal Graph Convolutional Network for Action Recognition	-
VPN (RGB + Pose)	86.3	87.8	VPN: Learning Video-Pose Embedding for Activities of Daily Living
EPP-Net (Parsing + Pose)	92.8	91.1	Explore Human Parsing Modality for Action Recognition
ViewCon (RGB)	87.5	85.6	Multi-View Action Recognition Using Contrastive Learning	-
IPP-Net (Parsing + Pose)	91.7	90.0	Integrating Human Parsing and Pose Network for Human Action Recognition
STAR-Transformer (RGB + Pose)	92.7	90.3	STAR-Transformer: A Spatio-temporal Cross Attention Transformer for Human Action Recognition	-
MMNet (RGB + Pose)	94.4	92.9	MMNet: A Model-Based Multimodal Network for Human Action Recognition in RGB-D Videos	-
PoseC3D (RGB + Pose)	96.4	95.3	Revisiting Skeleton-based Action Recognition

0 of 18 row(s) selected.