Accurate and Real-time 3D Pedestrian Detection Using an Efficient Attentive Pillar Network

Efficiently and accurately detecting people from 3D point cloud data is ofgreat importance in many robotic and autonomous driving applications. Thisfundamental perception task is still very challenging due to (i) significantdeformations of human body pose and gesture over time and (ii) point cloudsparsity and scarcity for pedestrian class objects. Recent efficient 3D objectdetection approaches rely on pillar features to detect objects from point clouddata. However, these pillar features do not carry sufficient expressiverepresentations to deal with all the aforementioned challenges in detectingpeople. To address this shortcoming, we first introduce a stackable PillarAware Attention (PAA) module for enhanced pillar features extraction whilesuppressing noises in the point clouds. By integratingmulti-point-channel-pooling, point-wise, channel-wise, and task-aware attentioninto a simple module, the representation capabilities are boosted whilerequiring little additional computing resources. We also present Mini-BiFPN, asmall yet effective feature network that creates bidirectional information flowand multi-level cross-scale feature fusion to better integrate multi-resolutionfeatures. Our proposed framework, namely PiFeNet, has been evaluated on threepopular large-scale datasets for 3D pedestrian Detection, i.e. KITTI, JRDB, andnuScenes achieving state-of-the-art (SOTA) performance on KITTI Bird-eye-view(BEV) and JRDB and very competitive performance on nuScenes. Our approach hasinference speed of 26 frame-per-second (FPS), making it a real-time detector.The code for our PiFeNet is available at https://github.com/ldtho/PiFeNet.