4D-StOP: Panoptic Segmentation of 4D LiDAR using Spatio-temporal Object Proposal Generation and Aggregation

In this work, we present a new paradigm, called 4D-StOP, to tackle the taskof 4D Panoptic LiDAR Segmentation. 4D-StOP first generates spatio-temporalproposals using voting-based center predictions, where each point in the 4Dvolume votes for a corresponding center. These tracklet proposals are furtheraggregated using learned geometric features. The tracklet aggregation methodeffectively generates a video-level 4D scene representation over the entirespace-time volume. This is in contrast to existing end-to-end trainablestate-of-the-art approaches which use spatio-temporal embeddings that arerepresented by Gaussian probability distributions. Our voting-based trackletgeneration method followed by geometric feature-based aggregation generatessignificantly improved panoptic LiDAR segmentation quality when compared tomodeling the entire 4D volume using Gaussian probability distributions. 4D-StOPachieves a new state-of-the-art when applied to the SemanticKITTI test datasetwith a score of 63.9 LSTQ, which is a large (+7%) improvement compared tocurrent best-performing end-to-end trainable methods. The code and pre-trainedmodels are available at: https://github.com/LarsKreuzberg/4D-StOP.