GPV-Pose: Category-level Object Pose Estimation via Geometry-guided Point-wise Voting

While 6D object pose estimation has recently made a huge leap forward, mostmethods can still only handle a single or a handful of different objects, whichlimits their applications. To circumvent this problem, category-level objectpose estimation has recently been revamped, which aims at predicting the 6Dpose as well as the 3D metric size for previously unseen instances from a givenset of object classes. This is, however, a much more challenging task due tosevere intra-class shape variations. To address this issue, we proposeGPV-Pose, a novel framework for robust category-level pose estimation,harnessing geometric insights to enhance the learning of category-levelpose-sensitive features. First, we introduce a decoupled confidence-drivenrotation representation, which allows geometry-aware recovery of the associatedrotation matrix. Second, we propose a novel geometry-guided point-wise votingparadigm for robust retrieval of the 3D object bounding box. Finally,leveraging these different output streams, we can enforce several geometricconsistency terms, further increasing performance, especially for non-symmetriccategories. GPV-Pose produces superior results to state-of-the-art competitorson common public benchmarks, whilst almost achieving real-time inference speedat 20 FPS.