HyperAIHyperAI
2 months ago

Holistic Interaction Transformer Network for Action Detection

Faure, Gueter Josmy ; Chen, Min-Hung ; Lai, Shang-Hong
Holistic Interaction Transformer Network for Action Detection
Abstract

Actions are about how we interact with the environment, including otherpeople, objects, and ourselves. In this paper, we propose a novel multi-modalHolistic Interaction Transformer Network (HIT) that leverages the largelyignored, but critical hand and pose information essential to most humanactions. The proposed "HIT" network is a comprehensive bi-modal framework thatcomprises an RGB stream and a pose stream. Each of them separately modelsperson, object, and hand interactions. Within each sub-network, anIntra-Modality Aggregation module (IMA) is introduced that selectively mergesindividual interaction units. The resulting features from each modality arethen glued using an Attentive Fusion Mechanism (AFM). Finally, we extract cuesfrom the temporal context to better classify the occurring actions using cachedmemory. Our method significantly outperforms previous approaches on the J-HMDB,UCF101-24, and MultiSports datasets. We also achieve competitive results onAVA. The code will be available at https://github.com/joslefaure/HIT.

Holistic Interaction Transformer Network for Action Detection | Latest Papers | HyperAI