Back to MLP: A Simple Baseline for Human Motion Prediction

This paper tackles the problem of human motion prediction, consisting inforecasting future body poses from historically observed sequences.State-of-the-art approaches provide good results, however, they rely on deeplearning architectures of arbitrary complexity, such as Recurrent NeuralNetworks(RNN), Transformers or Graph Convolutional Networks(GCN), typicallyrequiring multiple training stages and more than 2 million parameters. In thispaper, we show that, after combining with a series of standard practices, suchas applying Discrete Cosine Transform(DCT), predicting residual displacement ofjoints and optimizing velocity as an auxiliary loss, a light-weight networkbased on multi-layer perceptrons(MLPs) with only 0.14 million parameters cansurpass the state-of-the-art performance. An exhaustive evaluation on theHuman3.6M, AMASS, and 3DPW datasets shows that our method, named siMLPe,consistently outperforms all other approaches. We hope that our simple methodcould serve as a strong baseline for the community and allow re-thinking of thehuman motion prediction problem. The code is publicly available at\url{https://github.com/dulucas/siMLPe}.