8 months ago

Convolutional Neural Network

Action Recognition

Video Processing

Method/Architecture

Computer Vision

Shafkat Farabi Hasibul Himel Fakhruddin Gazzali Md. Bakhtiar Hasan Md. Hasanul Kabir Moshieur Farazi

Abstract

Action quality assessment (AQA) aims at automatically judging human actionbased on a video of the said action and assigning a performance score to it.The majority of works in the existing literature on AQA divide RGB videos intoshort clips, transform these clips to higher-level representations usingConvolutional 3D (C3D) networks, and aggregate them through averaging. Thesehigher-level representations are used to perform AQA. We find that the currentclip level feature aggregation technique of averaging is insufficient tocapture the relative importance of clip level features. In this work, wepropose a learning-based weighted-averaging technique. Using this technique,better performance can be obtained without sacrificing too much computationalresources. We call this technique Weight-Decider(WD). We also experiment withResNets for learning better representations for action quality assessment. Weassess the effects of the depth and input clip size of the convolutional neuralnetwork on the quality of action score predictions. We achieve a newstate-of-the-art Spearman's rank correlation of 0.9315 (an increase of 0.45%)on the MTL-AQA dataset using a 34 layer (2+1)D ResNet with the capability ofprocessing 32 frame clips, with WD aggregation.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

8 months ago

Convolutional Neural Network

Action Recognition

Video Processing

Method/Architecture

Computer Vision

Shafkat Farabi Hasibul Himel Fakhruddin Gazzali Md. Bakhtiar Hasan Md. Hasanul Kabir Moshieur Farazi

Abstract

Action quality assessment (AQA) aims at automatically judging human actionbased on a video of the said action and assigning a performance score to it.The majority of works in the existing literature on AQA divide RGB videos intoshort clips, transform these clips to higher-level representations usingConvolutional 3D (C3D) networks, and aggregate them through averaging. Thesehigher-level representations are used to perform AQA. We find that the currentclip level feature aggregation technique of averaging is insufficient tocapture the relative importance of clip level features. In this work, wepropose a learning-based weighted-averaging technique. Using this technique,better performance can be obtained without sacrificing too much computationalresources. We call this technique Weight-Decider(WD). We also experiment withResNets for learning better representations for action quality assessment. Weassess the effects of the depth and input clip size of the convolutional neuralnetwork on the quality of action score predictions. We achieve a newstate-of-the-art Spearman's rank correlation of 0.9315 (an increase of 0.45%)on the MTL-AQA dataset using a 34 layer (2+1)D ResNet with the capability ofprocessing 32 frame clips, with WD aggregation.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp