Is Diversity All You Need for Scalable Robotic Manipulation?

Data scaling has driven remarkable success in foundation models for NaturalLanguage Processing (NLP) and Computer Vision (CV), yet the principles ofeffective data scaling in robotic manipulation remain insufficientlyunderstood. In this work, we investigate the nuanced role of data diversity inrobot learning by examining three critical dimensions-task (what to do),embodiment (which robot to use), and expert (who demonstrates)-challenging theconventional intuition of "more diverse is better". Throughout extensiveexperiments on various robot platforms, we reveal that (1) task diversityproves more critical than per-task demonstration quantity, benefiting transferfrom diverse pre-training tasks to novel downstream scenarios; (2)multi-embodiment pre-training data is optional for cross-embodimenttransfer-models trained on high-quality single-embodiment data can efficientlytransfer to different platforms, showing more desirable scaling property duringfine-tuning than multi-embodiment pre-trained models; and (3) expert diversity,arising from individual operational preferences and stochastic variations inhuman demonstrations, can be confounding to policy learning, with velocitymultimodality emerging as a key contributing factor. Based on this insight, wepropose a distribution debiasing method to mitigate velocity ambiguity, theyielding GO-1-Pro achieves substantial performance gains of 15%, equivalent tousing 2.5 times pre-training data. Collectively, these findings provide newperspectives and offer practical guidance on how to scale robotic manipulationdatasets effectively.