Skeleton-DML: Deep Metric Learning for Skeleton-Based One-Shot Action Recognition

One-shot action recognition allows the recognition of human-performed actionswith only a single training example. This can influence human-robot-interactionpositively by enabling the robot to react to previously unseen behaviour. Weformulate the one-shot action recognition problem as a deep metric learningproblem and propose a novel image-based skeleton representation that performswell in a metric learning setting. Therefore, we train a model that projectsthe image representations into an embedding space. In embedding space thesimilar actions have a low euclidean distance while dissimilar actions have ahigher distance. The one-shot action recognition problem becomes anearest-neighbor search in a set of activity reference samples. We evaluate theperformance of our proposed representation against a variety of otherskeleton-based image representations. In addition, we present an ablation studythat shows the influence of different embedding vector sizes, losses andaugmentation. Our approach lifts the state-of-the-art by 3.3% for the one-shotaction recognition protocol on the NTU RGB+D 120 dataset under a comparabletraining setup. With additional augmentation our result improved over 7.7%.