HyperAI
Home
News
Latest Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
English
HyperAI
Toggle sidebar
Search the site…
⌘
K
Home
SOTA
Action Recognition In Videos
Action Recognition In Videos On Ntu Rgbd 120
Action Recognition In Videos On Ntu Rgbd 120
Metrics
Accuracy (Cross-Setup)
Accuracy (Cross-Subject)
Results
Performance results of various models on this benchmark
Columns
Model Name
Accuracy (Cross-Setup)
Accuracy (Cross-Subject)
Paper Title
Repository
DSCNet (RGB + Pose)
96.7
95.6
A Dense-Sparse Complementary Network for Human Action Recognition based on RGB and Skeleton Modalities
Body Pose Evolution Map
64.6
66.9
Recognizing Human Actions as the Evolution of Pose Estimation Maps
-
3DA (RGB + Pose)
91.4
90.5
Cross-Modal Learning with 3D Deformable Attention for Action Recognition
-
Gimme Signals (AIS)
70.8
71.59
Gimme Signals: Discriminative signal encoding for multimodal activity recognition
Skelemotion + Yang et al. (skeleton only)
66.9
67.7
SkeleMotion: A New Representation of Skeleton Joint Sequences Based on Motion Information for 3D Action Recognition
π-ViT (RGB only)
91.9
92.9
Just Add $π$! Pose Induced Video Transformers for Understanding Activities of Daily Living
VPN++ (RGB + Pose)
90.7
92.5
VPN++: Rethinking Video-Pose embeddings for understanding Activities of Daily Living
π-ViT (RGB + Pose)
96.1
95.1
Just Add $π$! Pose Induced Video Transformers for Understanding Activities of Daily Living
DVANet (RGB only)
90.4
91.6
DVANet: Disentangling View and Action Features for Multi-View Action Recognition
TSRJI
67.9
62.8
Skeleton Image Representation for 3D Action Recognition based on Tree Structure and Reference Joints
ST-GCN + AS-GCN w/DH-TCN
78.3
79.2
Vertex Feature Encoding and Hierarchical Temporal Modeling in a Spatial-Temporal Graph Convolutional Network for Action Recognition
-
VPN (RGB + Pose)
86.3
87.8
VPN: Learning Video-Pose Embedding for Activities of Daily Living
EPP-Net (Parsing + Pose)
92.8
91.1
Explore Human Parsing Modality for Action Recognition
ViewCon (RGB)
87.5
85.6
Multi-View Action Recognition Using Contrastive Learning
IPP-Net (Parsing + Pose)
91.7
90.0
Integrating Human Parsing and Pose Network for Human Action Recognition
STAR-Transformer (RGB + Pose)
92.7
90.3
STAR-Transformer: A Spatio-temporal Cross Attention Transformer for Human Action Recognition
-
MMNet (RGB + Pose)
94.4
92.9
MMNet: A Model-Based Multimodal Network for Human Action Recognition in RGB-D Videos
PoseC3D (RGB + Pose)
96.4
95.3
Revisiting Skeleton-based Action Recognition
0 of 18 row(s) selected.
Previous
Next