HOnnotate: A method for 3D Annotation of Hand and Object Poses
We propose a method for annotating images of a hand manipulating an objectwith the 3D poses of both the hand and the object, together with a datasetcreated using this method. Our motivation is the current lack of annotated realimages for this problem, as estimating the 3D poses is challenging, mostlybecause of the mutual occlusions between the hand and the object. To tacklethis challenge, we capture sequences with one or several RGB-D cameras andjointly optimize the 3D hand and object poses over all the framessimultaneously. This method allows us to automatically annotate each frame withaccurate estimates of the poses, despite large mutual occlusions. With thismethod, we created HO-3D, the first markerless dataset of color images with 3Dannotations for both the hand and object. This dataset is currently made of77,558 frames, 68 sequences, 10 persons, and 10 objects. Using our dataset, wedevelop a single RGB image-based method to predict the hand pose wheninteracting with objects under severe occlusions and show it generalizes toobjects not seen in the dataset.