KTPFormer: Kinematics and Trajectory Prior Knowledge-Enhanced Transformer for 3D Human Pose Estimation

This paper presents a novel Kinematics and Trajectory PriorKnowledge-Enhanced Transformer (KTPFormer), which overcomes the weakness inexisting transformer-based methods for 3D human pose estimation that thederivation of Q, K, V vectors in their self-attention mechanisms are all basedon simple linear mapping. We propose two prior attention modules, namelyKinematics Prior Attention (KPA) and Trajectory Prior Attention (TPA) to takeadvantage of the known anatomical structure of the human body and motiontrajectory information, to facilitate effective learning of global dependenciesand features in the multi-head self-attention. KPA models kinematicrelationships in the human body by constructing a topology of kinematics, whileTPA builds a trajectory topology to learn the information of joint motiontrajectory across frames. Yielding Q, K, V vectors with prior knowledge, thetwo modules enable KTPFormer to model both spatial and temporal correlationssimultaneously. Extensive experiments on three benchmarks (Human3.6M,MPI-INF-3DHP and HumanEva) show that KTPFormer achieves superior performance incomparison to state-of-the-art methods. More importantly, our KPA and TPAmodules have lightweight plug-and-play designs and can be integrated intovarious transformer-based networks (i.e., diffusion-based) to improve theperformance with only a very small increase in the computational overhead. Thecode is available at: https://github.com/JihuaPeng/KTPFormer.