JSIS3D: Joint Semantic-Instance Segmentation of 3D Point Clouds with Multi-Task Pointwise Networks and Multi-Value Conditional Random Fields

Deep learning techniques have become the to-go models for most vision-relatedtasks on 2D images. However, their power has not been fully realised on severaltasks in 3D space, e.g., 3D scene understanding. In this work, we jointlyaddress the problems of semantic and instance segmentation of 3D point clouds.Specifically, we develop a multi-task pointwise network that simultaneouslyperforms two tasks: predicting the semantic classes of 3D points and embeddingthe points into high-dimensional vectors so that points of the same objectinstance are represented by similar embeddings. We then propose a multi-valueconditional random field model to incorporate the semantic and instance labelsand formulate the problem of semantic and instance segmentation as jointlyoptimising labels in the field model. The proposed method is thoroughlyevaluated and compared with existing methods on different indoor scene datasetsincluding S3DIS and SceneNN. Experimental results showed the robustness of theproposed joint semantic-instance segmentation scheme over its singlecomponents. Our method also achieved state-of-the-art performance on semanticsegmentation.