8 months ago

Computer Vision

Video Processing

Object Tracking

Computer Vision

Ailing Zeng Xuan Ju Lei Yang Ruiyuan Gao Xizhou Zhu Bo Dai Qiang Xu

Abstract

This paper proposes a simple baseline framework for video-based 2D/3D humanpose estimation that can achieve 10 times efficiency improvement over existingworks without any performance degradation, named DeciWatch. Unlike currentsolutions that estimate each frame in a video, DeciWatch introduces a simpleyet effective sample-denoise-recover framework that only watches sparselysampled frames, taking advantage of the continuity of human motions and thelightweight pose representation. Specifically, DeciWatch uniformly samples lessthan 10% video frames for detailed estimation, denoises the estimated 2D/3Dposes with an efficient Transformer architecture, and then accurately recoversthe rest of the frames using another Transformer-based network. Comprehensiveexperimental results on three video-based human pose estimation and body meshrecovery tasks with four datasets validate the efficiency and effectiveness ofDeciWatch. Code is available at https://github.com/cure-lab/DeciWatch.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

8 months ago

Computer Vision

Video Processing

Object Tracking

Computer Vision

Ailing Zeng Xuan Ju Lei Yang Ruiyuan Gao Xizhou Zhu Bo Dai Qiang Xu

Abstract

This paper proposes a simple baseline framework for video-based 2D/3D humanpose estimation that can achieve 10 times efficiency improvement over existingworks without any performance degradation, named DeciWatch. Unlike currentsolutions that estimate each frame in a video, DeciWatch introduces a simpleyet effective sample-denoise-recover framework that only watches sparselysampled frames, taking advantage of the continuity of human motions and thelightweight pose representation. Specifically, DeciWatch uniformly samples lessthan 10% video frames for detailed estimation, denoises the estimated 2D/3Dposes with an efficient Transformer architecture, and then accurately recoversthe rest of the frames using another Transformer-based network. Comprehensiveexperimental results on three video-based human pose estimation and body meshrecovery tasks with four datasets validate the efficiency and effectiveness ofDeciWatch. Code is available at https://github.com/cure-lab/DeciWatch.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp