8 months ago

Abstract

Multi-view Detection (MVD) is highly effective for occlusion reasoning in acrowded environment. While recent works using deep learning have madesignificant advances in the field, they have overlooked the generalizationaspect, which makes them impractical for real-world deployment. The key noveltyof our work is to formalize three critical forms of generalization and proposeexperiments to evaluate them: generalization with i) a varying number ofcameras, ii) varying camera positions, and finally, iii) to new scenes. We findthat existing state-of-the-art models show poor generalization by overfittingto a single scene and camera configuration. To address the concerns: (a) wepropose a novel Generalized MVD (GMVD) dataset, assimilating diverse sceneswith changing daytime, camera configurations, varying number of cameras, and(b) we discuss the properties essential to bring generalization to MVD andpropose a barebones model to incorporate them. We perform a comprehensive setof experiments on the WildTrack, MultiViewX, and the GMVD datasets to motivatethe necessity to evaluate the generalization abilities of MVD methods and todemonstrate the efficacy of the proposed approach. The code and the proposeddataset can be found at https://github.com/jeetv/GMVD

Source PDF View Code