MOTRv2: Bootstrapping End-to-End Multi-Object Tracking by Pretrained Object Detectors

In this paper, we propose MOTRv2, a simple yet effective pipeline tobootstrap end-to-end multi-object tracking with a pretrained object detector.Existing end-to-end methods, MOTR and TrackFormer are inferior to theirtracking-by-detection counterparts mainly due to their poor detectionperformance. We aim to improve MOTR by elegantly incorporating an extra objectdetector. We first adopt the anchor formulation of queries and then use anextra object detector to generate proposals as anchors, providing detectionprior to MOTR. The simple modification greatly eases the conflict between jointlearning detection and association tasks in MOTR. MOTRv2 keeps the querypropogation feature and scales well on large-scale benchmarks. MOTRv2 ranks the1st place (73.4% HOTA on DanceTrack) in the 1st Multiple People Tracking inGroup Dance Challenge. Moreover, MOTRv2 reaches state-of-the-art performance onthe BDD100K dataset. We hope this simple and effective pipeline can providesome new insights to the end-to-end MOT community. Code is available at\url{https://github.com/megvii-research/MOTRv2}.