RepVF: A Unified Vector Fields Representation for Multi-task 3D Perception

Concurrent processing of multiple autonomous driving 3D perception taskswithin the same spatiotemporal scene poses a significant challenge, inparticular due to the computational inefficiencies and feature competitionbetween tasks when using traditional multi-task learning approaches. This paperaddresses these issues by proposing a novel unified representation, RepVF,which harmonizes the representation of various perception tasks such as 3Dobject detection and 3D lane detection within a single framework. RepVFcharacterizes the structure of different targets in the scene through a vectorfield, enabling a single-head, multi-task learning model that significantlyreduces computational redundancy and feature competition. Building upon RepVF,we introduce RFTR, a network designed to exploit the inherent connectionsbetween different tasks by utilizing a hierarchical structure of queries thatimplicitly model the relationships both between and within tasks. This approacheliminates the need for task-specific heads and parameters, fundamentallyreducing the conflicts inherent in traditional multi-task learning paradigms.We validate our approach by combining labels from the OpenLane dataset with theWaymo Open dataset. Our work presents a significant advancement in theefficiency and effectiveness of multi-task perception in autonomous driving,offering a new perspective on handling multiple 3D perception taskssynchronously and in parallel. The code will be available at:https://github.com/jbji/RepVF