Iterative Corresponding Geometry: Fusing Region and Depth for Highly Efficient 3D Tracking of Textureless Objects

Tracking objects in 3D space and predicting their 6DoF pose is an essentialtask in computer vision. State-of-the-art approaches often rely on objecttexture to tackle this problem. However, while they achieve impressive results,many objects do not contain sufficient texture, violating the main underlyingassumption. In the following, we thus propose ICG, a novel probabilistictracker that fuses region and depth information and only requires the objectgeometry. Our method deploys correspondence lines and points to iterativelyrefine the pose. We also implement robust occlusion handling to improveperformance in real-world settings. Experiments on the YCB-Video, OPT, and Choidatasets demonstrate that, even for textured objects, our approach outperformsthe current state of the art with respect to accuracy and robustness. At thesame time, ICG shows fast convergence and outstanding efficiency, requiringonly 1.3 ms per frame on a single CPU core. Finally, we analyze the influenceof individual components and discuss our performance compared to deeplearning-based methods. The source code of our tracker is publicly available.