RGB-Event Fusion for Moving Object Detection in Autonomous Driving

Moving Object Detection (MOD) is a critical vision task for successfullyachieving safe autonomous driving. Despite plausible results of deep learningmethods, most existing approaches are only frame-based and may fail to reachreasonable performance when dealing with dynamic traffic participants. Recentadvances in sensor technologies, especially the Event camera, can naturallycomplement the conventional camera approach to better model moving objects.However, event-based works often adopt a pre-defined time window for eventrepresentation, and simply integrate it to estimate image intensities fromevents, neglecting much of the rich temporal information from the availableasynchronous events. Therefore, from a new perspective, we propose RENet, anovel RGB-Event fusion Network, that jointly exploits the two complementarymodalities to achieve more robust MOD under challenging scenarios forautonomous driving. Specifically, we first design a temporal multi-scaleaggregation module to fully leverage event frames from both the RGB exposuretime and larger intervals. Then we introduce a bi-directional fusion module toattentively calibrate and fuse multi-modal features. To evaluate theperformance of our network, we carefully select and annotate a sub-MOD datasetfrom the commonly used DSEC dataset. Extensive experiments demonstrate that ourproposed method performs significantly better than the state-of-the-artRGB-Event fusion alternatives. The source code and dataset are publiclyavailable at: https://github.com/ZZY-Zhou/RENet.