8 months ago

Abstract

3D lane detection from monocular images is a fundamental yet challenging taskin autonomous driving. Recent advances primarily rely on structural 3Dsurrogates (e.g., bird's eye view) built from front-view image features andcamera parameters. However, the depth ambiguity in monocular images inevitablycauses misalignment between the constructed surrogate feature map and theoriginal image, posing a great challenge for accurate lane detection. Toaddress the above issue, we present a novel LATR model, an end-to-end 3D lanedetector that uses 3D-aware front-view features without transformed viewrepresentation. Specifically, LATR detects 3D lanes via cross-attention basedon query and key-value pairs, constructed using our lane-aware query generatorand dynamic 3D ground positional embedding. On the one hand, each query isgenerated based on 2D lane-aware features and adopts a hybrid embedding toenhance lane information. On the other hand, 3D space information is injectedas positional embedding from an iteratively-updated 3D ground plane. LATRoutperforms previous state-of-the-art methods on both synthetic Apollo,realistic OpenLane and ONCE-3DLanes by large margins (e.g., 11.4 gain in termsof F1 score on OpenLane). Code will be released athttps://github.com/JMoonr/LATR .

Source PDF View Code