MapTR: Structured Modeling and Learning for Online Vectorized HD Map Construction

High-definition (HD) map provides abundant and precise environmentalinformation of the driving scene, serving as a fundamental and indispensablecomponent for planning in autonomous driving system. We present MapTR, astructured end-to-end Transformer for efficient online vectorized HD mapconstruction. We propose a unified permutation-equivalent modeling approach,i.e., modeling map element as a point set with a group of equivalentpermutations, which accurately describes the shape of map element andstabilizes the learning process. We design a hierarchical query embeddingscheme to flexibly encode structured map information and perform hierarchicalbipartite matching for map element learning. MapTR achieves the bestperformance and efficiency with only camera input among existing vectorized mapconstruction approaches on nuScenes dataset. In particular, MapTR-nano runs atreal-time inference speed ($25.1$ FPS) on RTX 3090, $8\times$ faster than theexisting state-of-the-art camera-based method while achieving $5.0$ higher mAP.Even compared with the existing state-of-the-art multi-modality method,MapTR-nano achieves $0.7$ higher mAP, and MapTR-tiny achieves $13.5$ higher mAPand $3\times$ faster inference speed. Abundant qualitative results show thatMapTR maintains stable and robust map construction quality in complex andvarious driving scenes. MapTR is of great application value in autonomousdriving. Code and more demos are available at\url{https://github.com/hustvl/MapTR}.