A Sim2Real Deep Learning Approach for the Transformation of Images from Multiple Vehicle-Mounted Cameras to a Semantically Segmented Image in Bird's Eye View

Accurate environment perception is essential for automated driving. Whenusing monocular cameras, the distance estimation of elements in the environmentposes a major challenge. Distances can be more easily estimated when the cameraperspective is transformed to a bird's eye view (BEV). For flat surfaces,Inverse Perspective Mapping (IPM) can accurately transform images to a BEV.Three-dimensional objects such as vehicles and vulnerable road users aredistorted by this transformation making it difficult to estimate their positionrelative to the sensor. This paper describes a methodology to obtain acorrected 360{\deg} BEV image given images from multiple vehicle-mountedcameras. The corrected BEV image is segmented into semantic classes andincludes a prediction of occluded areas. The neural network approach does notrely on manually labeled data, but is trained on a synthetic dataset in such away that it generalizes well to real-world data. By using semanticallysegmented images as input, we reduce the reality gap between simulated andreal-world data and are able to show that our method can be successfullyapplied in the real world. Extensive experiments conducted on the syntheticdata demonstrate the superiority of our approach compared to IPM. Source codeand datasets are available at https://github.com/ika-rwth-aachen/Cam2BEV