Multi-View People Detection in Large Scenes via Supervised View-Wise Contribution Weighting

Recent deep learning-based multi-view people detection (MVD) methods haveshown promising results on existing datasets. However, current methods aremainly trained and evaluated on small, single scenes with a limited number ofmulti-view frames and fixed camera views. As a result, these methods may not bepractical for detecting people in larger, more complex scenes with severeocclusions and camera calibration errors. This paper focuses on improvingmulti-view people detection by developing a supervised view-wise contributionweighting approach that better fuses multi-camera information under largescenes. Besides, a large synthetic dataset is adopted to enhance the model'sgeneralization ability and enable more practical evaluation and comparison. Themodel's performance on new testing scenes is further improved with a simpledomain adaptation technique. Experimental results demonstrate the effectivenessof our approach in achieving promising cross-scene multi-view people detectionperformance. See code here: https://vcc.tech/research/2024/MVD.