BioPose: Biomechanically-accurate 3D Pose Estimation from Monocular Videos

Recent advancements in 3D human pose estimation from single-camera images andvideos have relied on parametric models, like SMPL. However, these modelsoversimplify anatomical structures, limiting their accuracy in capturing truejoint locations and movements, which reduces their applicability inbiomechanics, healthcare, and robotics. Biomechanically accurate poseestimation, on the other hand, typically requires costly marker-based motioncapture systems and optimization techniques in specialized labs. To bridge thisgap, we propose BioPose, a novel learning-based framework for predictingbiomechanically accurate 3D human pose directly from monocular videos. BioPoseincludes three key components: a Multi-Query Human Mesh Recovery model(MQ-HMR), a Neural Inverse Kinematics (NeurIK) model, and a 2D-informed poserefinement technique. MQ-HMR leverages a multi-query deformable transformer toextract multi-scale fine-grained image features, enabling precise human meshrecovery. NeurIK treats the mesh vertices as virtual markers, applying aspatial-temporal network to regress biomechanically accurate 3D poses underanatomical constraints. To further improve 3D pose estimations, a 2D-informedrefinement step optimizes the query tokens during inference by aligning the 3Dstructure with 2D pose observations. Experiments on benchmark datasetsdemonstrate that BioPose significantly outperforms state-of-the-art methods.Project website:\url{https://m-usamasaleem.github.io/publication/BioPose/BioPose.html}.