6 months ago

Action Recognition

Convolutional Neural Network

Video Processing

Method/Architecture

Computer Vision

Yu-Chee Tseng Huai-Wei Peng

Abstract

Due to the lengthy computing time for optical flow, recentworks have proposed to use the correlation operation as an alternative approach to extracting motion features. Although using correlation operations shows significant improvement with negligible FLOPs,it introduces much more latency per FLOP than convolution operations and increases noticeable latency as a larger searching patch isapplied. Nonetheless, shrinking the searching patch in correlation operation is doomed to degrade its performance owing to the inability tocapture larger displacements. In this paper, we propose an effective andlow-latency Multi-Scale Motion-Aware (MSMA) module. It uses smallersearching patches at different scales for efficiently extracting motion features from large displacements. It can be installed into and generalizeswell on different CNN backbones. When installed into TSM ResNet-50,the MSMA module introduces ≈ 17.6% more latency on NVIDIA TeslaV100 GPU, yet, it achieves state-of-the-art performance on SomethingSomething V1 & V2 and Diving-48.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

6 months ago

Action Recognition

Convolutional Neural Network

Video Processing

Method/Architecture

Computer Vision

Yu-Chee Tseng Huai-Wei Peng

Abstract

Due to the lengthy computing time for optical flow, recentworks have proposed to use the correlation operation as an alternative approach to extracting motion features. Although using correlation operations shows significant improvement with negligible FLOPs,it introduces much more latency per FLOP than convolution operations and increases noticeable latency as a larger searching patch isapplied. Nonetheless, shrinking the searching patch in correlation operation is doomed to degrade its performance owing to the inability tocapture larger displacements. In this paper, we propose an effective andlow-latency Multi-Scale Motion-Aware (MSMA) module. It uses smallersearching patches at different scales for efficiently extracting motion features from large displacements. It can be installed into and generalizeswell on different CNN backbones. When installed into TSM ResNet-50,the MSMA module introduces ≈ 17.6% more latency on NVIDIA TeslaV100 GPU, yet, it achieves state-of-the-art performance on SomethingSomething V1 & V2 and Diving-48.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

Multi-scale Motion-Aware Module for Video Action Recognition | Papers | HyperAI