HyperAIHyperAI
2 months ago

Weakly-Supervised Temporal Action Detection for Fine-Grained Videos with Hierarchical Atomic Actions

Li, Zhi ; He, Lu ; Xu, Huijuan
Weakly-Supervised Temporal Action Detection for Fine-Grained Videos with
  Hierarchical Atomic Actions
Abstract

Action understanding has evolved into the era of fine granularity, as mosthuman behaviors in real life have only minor differences. To detect thesefine-grained actions accurately in a label-efficient way, we tackle the problemof weakly-supervised fine-grained temporal action detection in videos for thefirst time. Without the careful design to capture subtle differences betweenfine-grained actions, previous weakly-supervised models for general actiondetection cannot perform well in the fine-grained setting. We propose to modelactions as the combinations of reusable atomic actions which are automaticallydiscovered from data through self-supervised clustering, in order to capturethe commonality and individuality of fine-grained actions. The learnt atomicactions, represented by visual concepts, are further mapped to fine and coarseaction labels leveraging the semantic label hierarchy. Our approach constructsa visual representation hierarchy of four levels: clip level, atomic actionlevel, fine action class level and coarse action class level, with supervisionat each level. Extensive experiments on two large-scale fine-grained videodatasets, FineAction and FineGym, show the benefit of our proposedweakly-supervised model for fine-grained action detection, and it achievesstate-of-the-art results.

Weakly-Supervised Temporal Action Detection for Fine-Grained Videos with Hierarchical Atomic Actions | Latest Papers | HyperAI