HyperAIHyperAI
17 days ago

Multi-scale Motion-Aware Module for Video Action Recognition

{Yu-Chee Tseng, Huai-Wei Peng}
Abstract

Due to the lengthy computing time for optical flow, recentworks have proposed to use the correlation operation as an alternative approach to extracting motion features. Although using correlation operations shows significant improvement with negligible FLOPs,it introduces much more latency per FLOP than convolution operations and increases noticeable latency as a larger searching patch isapplied. Nonetheless, shrinking the searching patch in correlation operation is doomed to degrade its performance owing to the inability tocapture larger displacements. In this paper, we propose an effective andlow-latency Multi-Scale Motion-Aware (MSMA) module. It uses smallersearching patches at different scales for efficiently extracting motion features from large displacements. It can be installed into and generalizeswell on different CNN backbones. When installed into TSM ResNet-50,the MSMA module introduces ≈ 17.6% more latency on NVIDIA TeslaV100 GPU, yet, it achieves state-of-the-art performance on SomethingSomething V1 & V2 and Diving-48.