EagerMOT: 3D Multi-Object Tracking via Sensor Fusion

Multi-object tracking (MOT) enables mobile robots to perform well-informedmotion planning and navigation by localizing surrounding objects in 3D spaceand time. Existing methods rely on depth sensors (e.g., LiDAR) to detect andtrack targets in 3D space, but only up to a limited sensing range due to thesparsity of the signal. On the other hand, cameras provide a dense and richvisual signal that helps to localize even distant objects, but only in theimage domain. In this paper, we propose EagerMOT, a simple tracking formulationthat eagerly integrates all available object observations from both sensormodalities to obtain a well-informed interpretation of the scene dynamics.Using images, we can identify distant incoming objects, while depth estimatesallow for precise trajectory localization as soon as objects are within thedepth-sensing range. With EagerMOT, we achieve state-of-the-art results acrossseveral MOT tasks on the KITTI and NuScenes datasets. Our code is available athttps://github.com/aleksandrkim61/EagerMOT.