HyperAIHyperAI
2 months ago

On the Utility of 3D Hand Poses for Action Recognition

Shamil, Md Salman ; Chatterjee, Dibyadip ; Sener, Fadime ; Ma, Shugao ; Yao, Angela
On the Utility of 3D Hand Poses for Action Recognition
Abstract

3D hand pose is an underexplored modality for action recognition. Poses arecompact yet informative and can greatly benefit applications with limitedcompute budgets. However, poses alone offer an incomplete understanding ofactions, as they cannot fully capture objects and environments with whichhumans interact. We propose HandFormer, a novel multimodal transformer, toefficiently model hand-object interactions. HandFormer combines 3D hand posesat a high temporal resolution for fine-grained motion modeling with sparselysampled RGB frames for encoding scene semantics. Observing the uniquecharacteristics of hand poses, we temporally factorize hand modeling andrepresent each joint by its short-term trajectories. This factorized poserepresentation combined with sparse RGB samples is remarkably efficient andhighly accurate. Unimodal HandFormer with only hand poses outperforms existingskeleton-based methods at 5x fewer FLOPs. With RGB, we achieve newstate-of-the-art performance on Assembly101 and H2O with significantimprovements in egocentric action recognition.

On the Utility of 3D Hand Poses for Action Recognition | Latest Papers | HyperAI