HyperAIHyperAI
2 months ago

Mask4Former: Mask Transformer for 4D Panoptic Segmentation

Yilmaz, Kadir ; Schult, Jonas ; Nekrasov, Alexey ; Leibe, Bastian
Mask4Former: Mask Transformer for 4D Panoptic Segmentation
Abstract

Accurately perceiving and tracking instances over time is essential for thedecision-making processes of autonomous agents interacting safely in dynamicenvironments. With this intention, we propose Mask4Former for the challengingtask of 4D panoptic segmentation of LiDAR point clouds. Mask4Former is thefirst transformer-based approach unifying semantic instance segmentation andtracking of sparse and irregular sequences of 3D point clouds into a singlejoint model. Our model directly predicts semantic instances and their temporalassociations without relying on hand-crafted non-learned association strategiessuch as probabilistic clustering or voting-based center prediction. Instead,Mask4Former introduces spatio-temporal instance queries that encode thesemantic and geometric properties of each semantic tracklet in the sequence. Inan in-depth study, we find that promoting spatially compact instancepredictions is critical as spatio-temporal instance queries tend to mergemultiple semantically similar instances, even if they are spatially distant. Tothis end, we regress 6-DOF bounding box parameters from spatio-temporalinstance queries, which are used as an auxiliary task to foster spatiallycompact predictions. Mask4Former achieves a new state-of-the-art on theSemanticKITTI test set with a score of 68.4 LSTQ.

Mask4Former: Mask Transformer for 4D Panoptic Segmentation | Latest Papers | HyperAI