HyperAIHyperAI
2 months ago

SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation

Cai, Zhongang ; Yin, Wanqi ; Zeng, Ailing ; Wei, Chen ; Sun, Qingping ; Wang, Yanjun ; Pang, Hui En ; Mei, Haiyi ; Zhang, Mingyuan ; Zhang, Lei ; Loy, Chen Change ; Yang, Lei ; Liu, Ziwei
SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation
Abstract

Expressive human pose and shape estimation (EHPS) unifies body, hands, andface motion capture with numerous applications. Despite encouraging progress,current state-of-the-art methods still depend largely on a confined set oftraining datasets. In this work, we investigate scaling up EHPS towards thefirst generalist foundation model (dubbed SMPLer-X), with up to ViT-Huge as thebackbone and training with up to 4.5M instances from diverse data sources. Withbig data and the large model, SMPLer-X exhibits strong performance acrossdiverse test benchmarks and excellent transferability to even unseenenvironments. 1) For the data scaling, we perform a systematic investigation on32 EHPS datasets, including a wide range of scenarios that a model trained onany single dataset cannot handle. More importantly, capitalizing on insightsobtained from the extensive benchmarking process, we optimize our trainingscheme and select datasets that lead to a significant leap in EHPScapabilities. 2) For the model scaling, we take advantage of visiontransformers to study the scaling law of model sizes in EHPS. Moreover, ourfinetuning strategy turn SMPLer-X into specialist models, allowing them toachieve further performance boosts. Notably, our foundation model SMPLer-Xconsistently delivers state-of-the-art results on seven benchmarks such asAGORA (107.2 mm NMVE), UBody (57.4 mm PVE), EgoBody (63.6 mm PVE), and EHF(62.3 mm PVE without finetuning). Homepage:https://caizhongang.github.io/projects/SMPLer-X/

SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation | Latest Papers | HyperAI