2 months ago

You Only Look at Once for Real-time and Generic Multi-Task

Wang, Jiayuan ; Wu, Q. M. Jonathan ; Zhang, Ning

Abstract

High precision, lightweight, and real-time responsiveness are three essentialrequirements for implementing autonomous driving. In this study, we incorporateA-YOLOM, an adaptive, real-time, and lightweight multi-task model designed toconcurrently address object detection, drivable area segmentation, and laneline segmentation tasks. Specifically, we develop an end-to-end multi-taskmodel with a unified and streamlined segmentation structure. We introduce alearnable parameter that adaptively concatenates features between necks andbackbone in segmentation tasks, using the same loss function for allsegmentation tasks. This eliminates the need for customizations and enhancesthe model's generalization capabilities. We also introduce a segmentation headcomposed only of a series of convolutional layers, which reduces the number ofparameters and inference time. We achieve competitive results on the BDD100kdataset, particularly in visualization outcomes. The performance results show amAP50 of 81.1% for object detection, a mIoU of 91.0% for drivable areasegmentation, and an IoU of 28.8% for lane line segmentation. Additionally, weintroduce real-world scenarios to evaluate our model's performance in a realscene, which significantly outperforms competitors. This demonstrates that ourmodel not only exhibits competitive performance but is also more flexible andfaster than existing multi-task models. The source codes and pre-trained modelsare released at https://github.com/JiayuanWang-JW/YOLOv8-multi-task