DualPoseNet: Category-level 6D Object Pose and Size Estimation Using Dual Pose Network with Refined Learning of Pose Consistency

Category-level 6D object pose and size estimation is to predict full poseconfigurations of rotation, translation, and size for object instances observedin single, arbitrary views of cluttered scenes. In this paper, we propose a newmethod of Dual Pose Network with refined learning of pose consistency for thistask, shortened as DualPoseNet. DualPoseNet stacks two parallel pose decoderson top of a shared pose encoder, where the implicit decoder predicts objectposes with a working mechanism different from that of the explicit one; theythus impose complementary supervision on the training of pose encoder. Weconstruct the encoder based on spherical convolutions, and design a module ofSpherical Fusion wherein for a better embedding of pose-sensitive features fromthe appearance and shape observations. Given no testing CAD models, it is thenovel introduction of the implicit decoder that enables the refined poseprediction during testing, by enforcing the predicted pose consistency betweenthe two decoders using a self-adaptive loss term. Thorough experiments onbenchmarks of both category- and instance-level object pose datasets confirmefficacy of our designs. DualPoseNet outperforms existing methods with a largemargin in the regime of high precision. Our code is released publicly athttps://github.com/Gorilla-Lab-SCUT/DualPoseNet.