Best Practices for 2-Body Pose Forecasting

The task of collaborative human pose forecasting stands for predicting thefuture poses of multiple interacting people, given those in previous frames.Predicting two people in interaction, instead of each separately, promisesbetter performance, due to their body-body motion correlations. But the taskhas remained so far primarily unexplored. In this paper, we review the progress in human pose forecasting and providean in-depth assessment of the single-person practices that perform best for2-body collaborative motion forecasting. Our study confirms the positive impactof frequency input representations, space-time separable and fully-learnableinteraction adjacencies for the encoding GCN and FC decoding. Othersingle-person practices do not transfer to 2-body, so the proposed best ones donot include hierarchical body modeling or attention-based interaction encoding. We further contribute a novel initialization procedure for the 2-body spatialinteraction parameters of the encoder, which benefits performance andstability. Altogether, our proposed 2-body pose forecasting best practicesyield a performance improvement of 21.9% over the state-of-the-art on the mostrecent ExPI dataset, whereby the novel initialization accounts for 3.5%. Seeour project page at https://www.pinlab.org/bestpractices2body