HyperAIHyperAI

Command Palette

Search for a command to run...

Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose Estimation

Wenhao Li Hong Liu† Runwei Ding Mengyuan Liu Pichao Wang Wenming Yang

Abstract

Despite the great progress in 3D human pose estimation from videos, it isstill an open problem to take full advantage of a redundant 2D pose sequence tolearn representative representations for generating one 3D pose. To this end,we propose an improved Transformer-based architecture, called StridedTransformer, which simply and effectively lifts a long sequence of 2D jointlocations to a single 3D pose. Specifically, a Vanilla Transformer Encoder(VTE) is adopted to model long-range dependencies of 2D pose sequences. Toreduce the redundancy of the sequence, fully-connected layers in thefeed-forward network of VTE are replaced with strided convolutions toprogressively shrink the sequence length and aggregate information from localcontexts. The modified VTE is termed as Strided Transformer Encoder (STE),which is built upon the outputs of VTE. STE not only effectively aggregateslong-range information to a single-vector representation in a hierarchicalglobal and local fashion, but also significantly reduces the computation cost.Furthermore, a full-to-single supervision scheme is designed at both fullsequence and single target frame scales applied to the outputs of VTE and STE,respectively. This scheme imposes extra temporal smoothness constraints inconjunction with the single target frame supervision and hence helps producesmoother and more accurate 3D poses. The proposed Strided Transformer isevaluated on two challenging benchmark datasets, Human3.6M and HumanEva-I, andachieves state-of-the-art results with fewer parameters. Code and models areavailable at \url{https://github.com/Vegetebird/StridedTransformer-Pose3D}.


Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

HyperAI Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose Estimation | Papers | HyperAI